Thursday, May 14, 2020

Some brief thoughts on how Nanite in UE5 possibly works....

Major caveat:  As of this writing, Epic hasn't really explained much so there's a very good chance I'm totally wrong about everything below.

So, what do we know?
- We know that Nanite takes in huge many-millions of poly meshes and somehow rasterizes them.
- We know that UE4's material model still needs to be used.
- We also know that at least right now, vertex animation isn't supported - only rigid meshes work.
- Epic hired Graham Wihlidal, the King of Clustering, to help with the development of Nanite

What might we conclude/speculate from this? Nanite likely employs a hierarchy of clusters.  Probably something like 32 triangles per cluster, so an easy way to think of this would be like a BVH, but instead of a binary tree, something like 32 cluster pointers in a node.  Think of this like a BVH32 (or BVH64 if it s 64 triangles) block as your core inner node in a hierarchy of nodes.  Using something like BVH32 would be a little dubious for a raytracer, but for a rasterizing scenario could possibly work great as its push instead of pull, but more on this in a second.  In the meantime, consider that 32^6 = 1B polys.  That means that by traversing a very small number of levels of this BVH would get you a lot of polygons in a contained hierarchy.

So how might this work in practice?  Dice the frame into 8x8 tiles.  Take your original visible object list's bounding volume set and test them against all tiles.  This could be done faster by testing at coarser tile sizes first, but whatever.  So, build a list of objects that overlap each working tile.  This will be list of objects will be evaluating.

Now basically texel-overlap test the cluster bounds, descending down the hierarchy per thread.  When testing the sub-cluster bounds, sort all the "hits" based on near-depth of the cluster.  If you end up with a cluster itself smaller than the current pixel, stop descending and simply output the cluster ID instead of a triangle, along with a candidate depth as the local depth and cluster id.  If you actually make it to the bottom cluster level and the cluster is still visible, add the cluster to an append buffer (cluster buffer).  If while descending, you find that the local cluster's bounding near-depth is behind the current stored local depth for that thread, break.  Doing the tree walk part efficiently is no joke, but that's certainly workable, possibly involving a tiny stack (possibly shared across the threads using lane access magic, or stored in LDS) like a typical manually implemented GPU ray-tracer.

Once you've resolved all the clusters in your tile, write out your depth buffer, the object/cluster ids, and construct your HTile for the tile if you're on a console and you've got a Visibility Buffer (mostly complete) and a Depth Buffer (mostly complete).  On PC, reconcile depth test acceleration by a fullscreen PS that exports depth.

Now sort the leftover cluster buffer by id, and do a duplicate reduction on it.  Rasterize these clusters via mesh shading, outputting depth and triangle id to the same visibility buffer, while doing depth testing of course.  Note, when dealing with triangle and cluster ids in the visibility buffer, I'd do something like prepend the visibility id with a 1 bit code to identify whether the payload came from a cluster or triangle (which presumably has object ID + cluster + prim id packed together.. somehow).

Now processing the visibility data itself wouldn't be too hard.  One knows either the cluster or the triangle should be enough to get to the interpolants you'd normally need to run the equivalent of a pixel shader for the material/triangle.  At this point its some variant of visibility buffer processing... which wouldn't be trivial to do, but is certainly doable.  I'm going to mostly hand-wave away how one gets visibility buffering working with an Unreal-style material model.  Other people have done this before.

Oh wait a second - I glossed over how you do this for a cluster, right?  Well, the cluster is basically an AABB containing all the vertices underneath it (or all clusters beneath it).  So what makes sense is to store not simply a positional AABB, but for each cluster an AABB of *all* meaningful interpolants.  This has some interesting side effects - it implies range of UVs, which implies UV coverage, which implies sampling mip levels when sampling textures for materials.  One could presumably pick the midpoint for the bounds and use that as the candidate UV for the cluster.  I'm not 100% sure how best to deal with normals for the cluster... perhaps the cluster stores an average normal or normal cone for its level to help in backface culling anyway during the above described traversal, and that average normal is used.  At some point presumably that breaks, but one supposes that if the cluster has a normal cone over 180 degrees, one can just assume the normal faces the camera anyway.

All this cluster data should compress well, as all interpolants are relative to the cluster bounds, and vertex indices could be encoded in a small number of DWORDs if one did it naively.  Clusters should compress down to nothing (relative to the original data), without needing anything particularly fancy.
There are other advantages to doing this.  Since all the geometry is hidden inside clusters at such extremes of level, you can presumably stream the geometry.  You can also book-keep to track what the lowest level cluster you happen to need is, and stream the next one as soon as possible.  As there's such a large order of magnitude change going on here that LODs of data can be pretty easily managed, and loaded in to a virtual space.  There are potentially pretty clever things you can do where all clusters (regardless of object) can live in a common cluster pool, and have mapping tables between objects and their virtual cluster ids, which could be stored in a per-object offset table.

There are a few obvious limitations to this idea:  It assumes a lot of preprocessing on geometry, so one cannot animate vertices or vertex attributes trivially.  No dynamic meshing of raw triangles would be supported.  It also presumes that the object is opaque (at least for a more straightforward implementation...) so at least for the moment, masking and transparency wouldn't be supported either.

Anyway - those are some early thoughts.  I talk a lot about AABBs, but its equally possible they're just spheres - in a lot of ways spheres can be easier to deal with in a hierarchy (although dealing with nonlinear scaling isn't fun).  It'll be interesting to see if I'm close or totally missed the boat here.  It wouldn't shock me if I'm totally off base, but I do think something like this could work. 

I should also point out - the complexity involved in getting all this that I'm suggesting working is pretty high.  I'm certainly hand-waving my way through a lot of complexity of implementation, and getting this to run fast enough to be competitive with standard rasterization when your meshes aren't millions of polygons wouldn't be particularly easy either.  Kudos to the entire team at Epic for figuring this stuff out.  The imagination involved is incredible, and it really is groundbreaking work.


Tuesday, August 21, 2018

Some thoughts re Raytracing post-Siggraph 2018.

Okay, so unless you've spent the last week or so living under a rock or simply don't care about graphics, you've probably heard by now about Turing, nVidia's upcoming GPU architecture that handles realtime raytracing.  It's not the first push towards this at the hardware level, but it's certainly the first to get this level of a marketting push and by far the most actively promoted to ISVs (ie, gamedevs).  It's also well timed in that its actually clear to the game-graphics community, for perhaps the first time, that one can yield some kind of practical win from raytracing despite very limited ray-per-pixel counts.

It's exciting stuff, to be sure.  Even if there are lots of open-ended questions/doubts about how one handles various types of content or situations it still adds a rush knowing that yet again the real-time rendering half-life of about 5 years has kicked in and..  half of what you took for granted flies out the window along with scrambling to learn new paradigms.

All that said, it's still not really clear how a lot of this stuff will work in practice.  The announced cards are hardly cheap, and it's still unclear where other IHVs like AMD and Intel will fall in the mix.  So it's not like we can count of customers having the hardware in large numbers for quite a while.. which means hybridized pluggable pipelines (ie, working your shadowmap shadows to result in a resolved mask that can be swapped for a high quality raytraced result).  

Even then, it's not clear what best practices are for gamedevs to consider for all sorts of common scenarios we encounter daily at the moment.  A straightforward example of this to consider would be a non-bald human hero character standing in a forest.  
- Raytracing relies on rebuilds/refits of BVH structures to describe the dynamic elements in the scene but its certainly not clear how best to manage that, and it seems that currently no one's really sure.  
- Do you separate your dynamic elements into a separate BVH from your statics, to reduce the refit burden?  But that means needing to double your ray-testing... probably everywhere.  
- Presumably the BVH needs to reside in video memory for the raytracing hardware to be effective, but what's the practical memory consumption expected?  How much memory do I have for everything else?  Is it fair to assume walking an system memory based BVH is something of a disaster?  Given the memory reclamation that can happen to an app, I presume one must ensure a BVH can never exceed 50% of total video memory.
- There's some minor allowance for LOD-ish things via ray-test flags, but what are the implications of even using this feature?  How much more incoherent do I end up if my individual rays have to decide LOD?  Better yet, my ray needs to scan different LODs based on distance from ray origin (or perhaps distance from camera), but those are LODs *in the BVH*, so how do I limit what the ray tests as the ray gets further away?  Do I spawn multiple "sub-rays" (line segments along the ray) and given them different non-overlapping range cutoffs, each targetting different LOD masks?  Is that reasonable to do, or devastatingly stupid?  How does this affect my ray-intersection budget?  How does this affect scheduling?  Do I fire all LOD's rays for testing at the same time, or so I only fire them as each descending LOD's ray fails to intersect the scene?
- How do we best deal with texture masking?  Currently hair and leaves are almost certainly masked, and really fine grain primitives almost certainly have to be.  I suspect that while it's supported, manual intersection shaders that need to evaluate the mask are best avoided if at all possible for optimal performance.  Should we tessellate out the mask wherever possible?  That might sound nice, but could easily turn into a memory consuming disaster (and keep in mind, the BVH isn't memory free, and updating it isn't performance free either).  It might be tempting to move hair to a spline definition like the film guys do, but that's likely just not practical as things still have to interop well with rasterization and updating a few hundred thousand splines, or building an implicit surface intersection shader to infer the intersections doesn't sound like fun (well, actually it does, but that's besides the point).
- Even something like a field of grass becomes hugely problematic, as every blade is presumably moving and there are potentially millions of the little bastards in a fairly small space.  It's basically just green short hair for the ground.  Maybe it ends up procedurally defined as suggested before and resolved in an intersection shader, but again... confusing stuff to deal with.

Or maybe these cases get punted on.  That would be disappointing, but certainly simplifies things.  We rasterize a gbuffer initially and when we need to spawn rays, we just assume our forest is barren, grass is missing, and our character is bald.  We correct for these mistakes via current methods, which are hardly perfect, but better than nothing.  This makes things a lot more complicated, though:
- You can drop leaves from the trees for shadow casting, but then you're going to still need leaf shadows from some processes - presumably shadowmapping.  How do you make the two match up (since presumably raytracing devastates the SM quality comparison)?  
- Maybe for AO you trace a near field and far field, and for near field you ignore the leaves and for far field you use an opaque coarse leaf proxy?  Maybe this can work for shadows as well in certain cases if you apply the non-overlapping range-ray idea mentioned earlier, assuming they're going to get softened anyway?

There are all sorts of other problems too, related to BVH generation... 
- Say I've got a humanoid, and build the BVH against a T-pose initially.  How does the refit handle triangles massively changing orientation?  How well does it handle self-intersection (which sadly, happens more than we might like)?  What happens when my character is attempting to dodge and rolls into a ball to jump out of the way?  Do these degenerate cases cause spikes, as the BVH degrades and more triangles end up getting tested?  Does my performance wildly fluctuate as my character animates due to these issues?
- If I have an open world game, how do I stream in the world geometry?  At some point in the future, when the BVH format is locked down and thus directly authored to, maybe this becomes straightforward, but for now... yikes.  Does one have to rethink their entire streaming logic?  Maybe a BVH per sector (assuming that's even how you divide the world), although that causes all sorts of redundant ray-fires.  Maybe you manually nest BVHs by cheating - use the custom intersection from a top level BVH to choose from amongst which of lower BVHs to intersect, so that you can have disparate BVHs but don't have to rayfire from the top-most level?  Who knows?

Partial support is certainly better than zero support, but is raytracing as sexy of a solution when it fails to consider your hero character and/or other dynamics?  There's an obvious desire for everything's appearance to be unified, but it wasn't soooo long ago that having entirely different appearance solutions for the world and for characters was the norm, and that the focus was primarily on a more believable looking world of mostly static elements (say, all the original radiosity-lightmapped games).  Even now there tends to be a push to compromise on the indirect lighting quality of dynamics on the assumption they're a minority of the scene.  Perhaps a temporary step backwards is acceptable for an interlude, or maybe that horse has already left the barn?

This post might all sound really negative, but its really not meant to be that way - raytracing for practical realtime scenarios is still in its infancy and its just not realistic for every problem to be solved (or at least, not solved well) on day-1.  To make matters worse, in many cases while working with gamedevs is clearly a good call, it's certainly a big responsibility of nVidia and other IHVs to not prematurely lock down the raytracing pipeline to one way of working simply because they talked to one specific dev-house who have a particular way of thinking and/or dealing with things.

These problems are currently open ones, but a lot of smart people will do their best to come to reasonable solutions over the next few years.  Raytracing remains a needed backbone for a ton of techniques, so a desire to see it working isn't going away anytime soon.  Personally I'm pretty excited to explore these problems, and really looking forward to the visual jump we'll be able to see in games in the not-so-distant future.

Sunday, March 20, 2016

Everything Old is New Again!

This GDC seemed to be more productive than most in recent memory, despite spending most of it in meetings, as per usual.  Among many other useful things, was Dan Baker's talk on Object Space Shading (OSS) during the D3D Dev Day.  I think it's probably a safe bet that this talk will end up getting cited a lot over the next couple years.  The basic jist of it was:
- Ensure all your objects can be charted.
- Build a list of visible objects.
- Attempt to atlas all your visible charts together into a texture page, allocating them space proportional to their projected screen area.
- Assuming you've got world-space normals, PBR Data, and position maps, light into this atlased texture page, outputting final view-dependent lit results.
- Generate a couple mips to handle varying lookup gradients.
- Now render the screen-space image, with each object simply doing a lookup into the atlas.

Pretty elegant.  Lots of advantages in this:
- essentially you keep lighting costs fixed (proportional to the super-chart's resolution)
- your really expensive material and lighting work doesn't contend with small triangle issues.
- all your filtering while lighting and shading is brutally cheap.
- because things are done in object space, even if you undersample things, should look "okay".

Now, as suggested there are a few issues that I think are mostly ignore-able due to the fact that this was designed around RTSs - I think in a general case you'd really want to be a lot more careful about how things get atlassed as I think something like an FPS can fairly easily create pathological cases that "break" if you're just using projected area.  For example, imagine looking at a row of human characters forming a Shiva-pose, by which I mean, standing one behind the other with their arms at different positions.  Of course, this doesn't really break anything, but it does mean you're likely to oversubscribe the atlas and have quality bounce around depending on what's going on.  Even so, still pretty interesting to play around with.

So, I'm going to propose a different way to think about this which is actually in many ways so obvious I'm surprised more people didn't bring it up when we excitedly discussed OSS - lightmapping.  Ironically this is something people have been trying to get away from, but I guess I'll propose a way to restate the problem:

Consider that not everything necessarily needs the same quality lighting, or needs to be updated every frame.  So let's start by considering that we could maybe build three atlases - one for things that need updating every frame, and two low frequency atlases, which we update on alternating frames.  Now if we assume we're outputting final lighting this might be a bit problematic because specularity is obviously view dependent and changing our view doesn't change the highlight.

Okay, so what if we don't output final shading but instead light directly into some kind of lighting basis?  For example, the HalfLife-2 basis (Radiosity Normal Maps or RNMs), or maybe Spherical Gaussians as demo'd by Ready At Dawn.  Now obviously your specular highlights will no longer be accurate as you're picking artificial local light directions.
As well, RNMs and much of their ilk tend to be defined in tangent-space, not world-space, so that's somewhat less convenient, as instead of needing to provide your lighting engine with just a normal, you actually need the tangent basis instead, so you can rotate the basis into world-space before accumulating coefficients.  But its been demonstrated by Farcry4 you could do this by encoding a quaternion in 32 bits, so hardly impossible.  And FWIW, RNMs tend to be fairly compressible (6 coefficients, using an average color, are typically fine).
Anyway, storing things in a basis in this way provides a number of interesting advantages that should be pretty familiar:
- Lighting is independent of material state/model/BRDF.  You don't need the albedo, metalness, roughness, etc.  This means that in cases where your materials are animating, you can still provide the appearance of high frequency material updates.  You can still have entirely different lighting models from object to object if you so choose.  Because of this, all you need to initially provide to the lighting system is the tangent basis and world-space position of each corresponding texel.  Your BRDF itself doesn't matter for building up the basis, so you can essentially do your BRDF evaluation when you read back the basis in a later phase (probably final shading).  This is analogous to, say, lighting into an SH basis where you simply project the lights into SH and sum them up - the SH basis texel density can be pretty sparse while still providing nice looking lighting results so long as your lighting variation frequency is proportional to less than the lighting density.  Of course, specularity can be problematic depending on what you're trying to do, but more on that below.
- As said, lighting spatial frequency doesn't have to be anywhere near shading frequency and can be considerably lower as lerping most lighting bases tends to produce nice results without affecting the final lighting quality significantly (with the exception of shadowing, typically).
- Specular highlights, while inaccurate due to compressing lighting into a basis, can properly respond to SpecPower changes quite nicely.  There's also nothing stopping you from still using reflections as one normally would during the shading phase.  Lots of common lightmap+reflection tricks could be exploited here as well.  If you end up only needing diffuse for some reason, SH should be adequate (so long as you still cull backfacing lights), and would remove the tangent-space storage requirements - though you'd need to track vertex normals.
- There's no law that says *all* lighting needs to be done uniformly in the same way.  You could do this as a distinctly separate pass that feeds the pass that Baker described, or process them in the forward pass should the need arise.

And last but not least, if you're moving lighting into a basis like this, there's no rule that says you need to update everything every frame.  So for example, you could partition things into update-frequency-oriented groups and update based on your importance score.  This would also allow for your light caching to be a little more involved as you could now keep things around a lot longer (in a more general LRU cache).  For example, you could have very very low res lighting charts per object, all atlased together into one big page that's built on level load as a fallback if something suddenly comes into view, loose bounce determination, or distant LODs.  You could even PRT the whole thing into a super-atlas, assign each object a page, and treat the whole thing as a cache that you only update as needed!

Anyway, just some ideas I've been playing around with that I figured I'd share with everyone.


Thursday, November 15, 2012

Dynamic Lighting in Mortal Kombat vs DC Universe

MKvsDC (as the title is colloquially referred to at NetherRealm) was our very first foray into the X360/PS3 generation of hardware. As a result, it featured a lot of experimentation and a lot of exploration to figure out what might and might not work. Unlike our more recent fighters, Mortal Kombat (2011) and Injustice: Gods Among Us (which is still getting completed and looking pretty slick so far), MKvsDC was actually a full-on 3D fighter in the vein of our last-gen (PS2/GameCube/Xbox1) efforts. (Note, once again, everything here has been explained publicly before, just in a bit less detail).

A lot of figuring out how to get things going in MKvsDC was trying to figure out what was doable and at the same time acceptable. The game had to run at 60Hz, but was both being built inside an essentially 30Hz targeted engine and had to look on par with 30Hz efforts. I remember quite early on in development Tim Sweeney and Mark Rein came out to visit Midway Chicago. I remember going for lunch with the Epic crew, myself spending the majority of it speaking with Tim (extremely nice and smart guy) about what I was going to try to do to Unreal Engine 3 to achieve our visual and performance goals. Epic was very up front and honest with us about how UE3 was not designed to be optimal for a game like ours. To paraphrase Sweeney, since this was years ago, "You're going to try to get UE3 to run at 60Hz? That's going to be difficult."

He wasn't wrong. At the time, UE3 was pretty much a multipass oriented engine (I believe it still theoretically is if you're not light baking, using Dominant Lights, or using Lighting Environments, though no one would ship a game that way). Back then there were still a lot of somewhat wishy-washy ideas in the engine like baking PRT maps for dynamic objects, which ended up largely co-opted to build differential normal maps. Lots of interesting experimental ideas in there at the time, but most of those were not terribly practical for us.

So Item #1 on my agenda was figuring out how to light things cheaper, but if possible, also better. Multipass lighting was out - the overhead of multipass was simply too high (reskinning the character per light? I don't think so!). I realize I'm taking this for granted, but I probably shouldn't - consider that one of the obvious calls early on was to bake lighting whereever possible. Clearly this biases towards more static scenes.

Anyhoo, we had really two different problems to solve. The first was how were we going to light the characters. The fighters are the showcase of the game, so they had to look good, of course. The second problem was how were we going to handle the desire for dynamic lighting being cast by the various effects the fighters could throw off. We handled it in the previous gen, so there was a team expectation that it would somehow be available "as clearly we can do even more, now".

So, the first idea was something I had briefly played around with on PS2 - using spherical harmonics to coalesce lights, and then light with the SH's directly. Somewhat trivially obvious now, it was a bit "whackadoo" at the time. The basics of the solution were already rudimentarally there with the inclusion of Lighting Environments (even if the original implementation wasn't entirely perfect at the time). Except instead of extracting a sky-light and a directional as Epic did, we would attempt to just directly sample the SH via the world-space normal.

This worked great, actually. Diffuse results were really nice, and relatively cheap for an arbitrary number of lights (provided we could safely assume these were at infinity). Specularity was another matter. Using the reflection vector to lookup into the solution was both too expensive and of dubious quality. It somewhat worked, but it didn't exactly look great.

So after playing around with some stuff, and wracking my brain a little, I came up with a hack that worked pretty decently given that we were specifically lighting characters with it. In essence, we would take the diffuse lighting result, use that as the local light color, and then multiply that against the power-scaled dot between the normal and eye vector. This was very simple, and not physically correct at all, but surprisingly it worked quite nicely and was extremely cheap to evaluate.

But, then Steve (the art director) and Ed came to me asking if there was anything we could do to make the characters pop a little more. Could we add a rim-lighting effect of some kind? So, again seeking something cheap, I tried a few things, but the easiest and cheapest thing seemed to be taking the same EdotN term and playing with it. The solution I went with was basically something like (going from memory here):

_Clip = 1 -(E dot N);
RimResult = pow(_Clip, Falloff)*(N dot D) * (_Clip > Threshold)

Where E is the eye/view vector, N the world-space normal, D a view-perpendicular vector representing the side we want the rim to show up on, and Falloff how sharp we want the highlight to seem. Using those terms provided a nice effect. Some additional screwing around discovered the last part - the thresholding.

This allowed for some truly great stuff. Basically this hard thresholds the falloff of the rim effect. So, this allows for, when falloff is high and threshold is as well, a sharp highlight with a sharp edge to it, which is what Steve wanted. Yet, if you played with it a bit other weirder things were possible, too. If you dropped the falloff so it appeared more gradual and yet hard thresholded early this gave a strange cut-off effect that looked reminiscent of metal/chrome mapping!

To further enhance all of this, for coloring I would take the 0th coefficients from the gathered SH set and multiply the rim value by that, which gave it an environmental coloring that changed as the character moved around. This proved so effective that initially people were asking how I could afford environment mapping in the game. All from a simple hard thresholding hack. In fact, all the metal effects on characters in MKvsDC are simulated by doing this.

Okay, so characters were covered. But environment lighting... yeesh. Once again, multipass wasn't really practical. And I knew I wanted a solution that would scale, and not drag the game into the performance gutter when people started spamming fireballs (or their ilk). What to do....

So it turned out that well, these fireball-type effect lights rarely hung around for very long. They were fast, and moved around a lot. And yet, I knew for character intros, fatalities and victory sequences simply adding them into the per-object SH set wouldn't prove worthwhile, because we'd want local lighting from them. Hmm...

So, I ended up implementing two different approaches - one for characters and a second for environments. For characters, I did the lighting in the vertex shader, hardcoding for 3 active point lights, and outputting the result as a single color interpolator into the pixel shader. This was then simply added in as an ambient factor into the diffuse lighting. As these lights tended to be crazy-over bright, the washing out of per-pixel detail didn't tend to matter anyway.

Environments were more challenging though. As tessellation of the world tended to be less consistent or predictable, the three lights were diffuse-only evaluated either per-pixel or per-vertex (artist selectable). When using per-vertex results, again a single color was passed through, but modulated against the local per-pixel normal's Y component (to fake a quasi ambient occlusion). This worked well enough most of the time, but not always. If you look carefully you can see the detail wash out of some things as they light up.

To keep track of the effect lights themselves, a special subclass of light was added that ignored light flags, and that were managed directly in a 3-deep FIFO so that the designers could spawn lights at will without having to worry about managing them. When dumped out of the end of the FIFO lights wouldn't actually be deleted of course, merely re-purposed and given new values. Extinguished lights were given the color black so they'd stop showing up. Objects and materials had to opt in to accepting dynamic lighting for it to show up, but anything that did was always evaluating all 3 lights whether you could see them or not.

Ironically about 3 months before shipping I accidentally turned off the effect lights showing up on characters and didn't realize until the very last minute when it was pointed out by QA (switched back on at the 11th hour!), which is why you'll find very few screenshots online created by our team with obvious effect lights showing up on the fighters. Oops!

Sunday, November 4, 2012

Handling Shadows in Mortal Kombat (2011)

So I haven't posted anything in... well, so long this blog is mostly defunct. But I figured it'd be worth posting something again. I can't really talk about Injustice: Gods Among Us's tech until we're done, and most likely shipped it. There's some cool graphics tech in there I'd love to go into... but it's too early. Ideally a GDC-like forum (though I won't be presenting at GDC... too turned off by my experience trying to submit something to GDC11) would be the best place. We'll see. Note, very little of what's described below applies to Injustice. FWIW I've talked about this publicly before (at Sony DevCon I think), but this explains things a bit more thoroughly.

I figured I'd talk a little about how shadows were dealt with in Mortal Kombat (aka MK9). The key thing to keep in mind is that in MK9 the performance envelope that could be dedicated to shadows was extremely limited. And yet, there were some conflicting goals of having complex looking shadow/light interaction. For example, our Goro's Lair background features a number of flickering candles casting shadows on walls, along with the typical grounding shadow one expects.

Goro's Lair concept art

So, what did we do? We cheat. A lot. But everyone who can cheat in game graphics *should* cheat, or imho, you're doing it wrong. Rendering for games is all about cheating (or to be better about wording - being clever). It's all about being plausible, not correct.

The key to keeping lighting under control in Mortal Kombat is rethinking light management. Unlike a lot of games, Mortal Kombat has the distinct advantage of a very (mostly) controlled camera and view. There are hard limits to where are shadow casters can go, what our lights do and therefore where shadows might appear. Thus, unlike probably every other engine ever made, we do NOT compute caster/receiver interaction at runtime. Instead, we require artists to explicitly define these relationships. This means that in any given environment we always know where shadows might show up, we know which lights cast shadows, we know which objects are doing the casting.

So, I'm not a total jerk in how this exposed to the artists. They don't have to directly tie the shadow casting light to the surface. Instead they mark up surfaces for their ability to receive shadows and what kind of shadows the surface can receive. Surfaces can receive the ground shadow, a spotlight shadow, or both. You want two spotlights to shadow-overlap? Sorry, not supported. Redesign your lighting. This might sound bad, but it ends up with shadows not being "locally concentrated" and spread across the level. More on this shortly...

As everything receiving shadows in MK9 is at least partially prelit, shadows always down-modulate. All shadows cast by a single light source are presumed to collate into a single shadow-map, and all "maps" actually occupy a single shadow texture. So we can practically handle four shadows simultaneously being constructed - the overhead of building the maps starts to get crazy. And at a certain point we're risking quality to the point where it's likely not worth it. As we only have a few characters on-screen casting shadows, at a certain point its just not worth the trouble, either.

So, in a level like Goro's Lair in MK9 we can have a number of flickering candle shadows on the walls, giving it a really dramatic spooky look, especially when combined with the nice wet-wall shader work by the environment team. We can handle this cost by knowing explicitly that only a max of four shadows ever update, and that the local pixel shader for any given wall section only has to handle a given specific spot shadow projection. This allows the artists to properly balance the level's performance costs and ensure they stay within budget (which for shadow construction is absurdly low).

For texture projection (aka gobos in our engine's terminology) the same logic applies. You can get either subtractive gobos (say, the leaves in Living Forest) or additive ones (the stained glass in Temple) in a level, but not both. You can have multiple gobos in a level, but only one can hit a particular object at a time. Objects explicitly markup that they can receive the gobos, and then even pick how complex the light interaction is expected to be to keep shader cost under control (does the additive gobo contribute to the lighting equation as a directional light or merely as ambient lighting).

Concept art for The Temple

The gobos themselves can't be animated - they're not Light Functions in Epic-speak. Light Functions are largely unworkable, as they're too open-ended cost wise - the extra pass per-pixel is too expensive (MK demands each pixel is touched only once to the point of no Z-prepass). And hey, they're generally *extremely* rare to see in UE3 titles, even by Epic, for good reason. But, we can fake some complex looking effects by allowing artists to animate the gobo location, or switch between active gobos. Flying these gobo casters around is how we animate the dragon-chase sequence found in RoofTop-Day, which ended up being quite clever.

But that's the point really. The difference between building a game and building an engine is figuring out clever uses of tech, not trying to solve open-ended problems. The key is always to make it look like you're doing a whole lot more than you actually are. If you're going to spend the rendering time on an effect, make sure its an obvious and dramatic one.

Thursday, March 1, 2012

A quick note to those who might see this - Myself and a colleague (the ever brilliant Gavin Freyberg) will be giving a talk at the Epic booth during GDC'12. I've been told its Wednesday around 11:45ish. The talk will be reviewing a variety of things we've been doing related to 60Hz - mostly covering work done in MK9 (aka Mortal Kombat 2011), but a small smattering of info on some of the more recent stuff our teams been up to on our next game. A good chunk of the MK9 info is stuff we really haven't talked in any kind of detail about before, so it might be of interest to some.

Nothing about our next game itself of course - that's PR's job for when we eventually announce it.

Sunday, March 21, 2010


Okay, time for a minor pet-peeve. If you're going to include a DVD with a book that includes sample implementations of concepts you MUST make it a requirement that people include some kind of sample for EVERY CHAPTER. I realize that's an additional burden on the author, and the editor might go through hell trying to wrangle all those chapters. Oh, and no doubt there are concerns about code quality, and lots of experimental code tends to be a little spaghetti-ish. But frankly I'll take anything if it can show me a sample implementation I can just quickly rip apart and get to the meat of.
Implementation details are often left out, or left unclear, or just left as exercises for the reader to discover. Given the purpose of a "Gems"-style book - which is to provide a reader with insight into *implementation* details and not simply to get some idea published - it's important that the reader can walk away with as clear an understanding as possible.