Thursday, August 13, 2009

Has someone tried this before?

Okay, so I'll be describing a simple approach that I haven't seen being pushed around (I should admit, it bears some pretty obvious similarities to Peter Pike Sloan's et al's Image-Based Proxy Accumulation for Real-Time Soft Global Illumination, though I think my suggested method is a lot simpler. There's been a lot of talk about deferred lighting solutions to renderers. Most of these solutions have one big thing in common: they require a per-pixel screenspace normal map and specular exponent (or some other fancy BRDFish properties) to have been written out in advance. Not a huge limitation, but a real one nonetheless. So what follows removes that limitation:

a) do a standard Z-prepass
b) allocate a 3-deep MRT color buffer, init to black
c) now evaluate all light sources as one normally might for any standard deferred lighting solution, except we add to the color buffers view-space relative Spherical Harmonic representations of the light's irradiance relative to the current pixel.

It's implicit, but to make it explicit and state it outright, you're writing a very simple 3-color 4 coefficient SH into the buffer. Or, alternatively one might choose a hemispherical basis that needs fewer coefficients, but there are good reasons to stick with a spherical one (primarily that you can handle arbitrary reflection vectors).

So, why bother with this? Here are a few interesting "wins".
1) lighting is entirely decoupled from all other properties of the framebuffer - even local normals. Lighting can be evaluated separately when we get to shading using a lighting model that changes relative to the shading.

2) since lighting is typically low frequency in scenes, you can almost certainly get away with constructing the buffer at a resolution lower than the real frame buffer. In fact, since the lighting is independent of things like normal discontinuities, you might even be able to get away with ignoring edge discontinuities. Or you could probably work around this using an ID buffer constructed later, similar to how Inferred lighting works to multisample the SH triplet (thought that seems pretty expensive) and choose the best sample.

3) To me this is really the biggest win of all - because this is decoupled from the need for a G-buffer or any actual properties at the pixel other than its location in space, you could really start doing this work immediately after a prepass has begun! This is becoming more and more important going forward, as getting more and more stuff independent means its easier to break things into separate jobs and distribute across processing elements. In this case you could actually subdivide the work and have the GPU and CPU/SPU split the work in a fairly simple way, and its almost the perfect SPU-type task as you don't need any underlying data from the source pixel other than Z.

4) MSAA can be handled in any number of ways, but at the very least, you can deal with it the same way Inferred Lighting does.

5) There's no reason for specularity to suffer from the typical Light-Prepass problem of color corruption by using diffuse color multiplied by spec intensity to fake specular color. Instead you could just evaluate the SH with the reflection vector. Of course, one does need to consider that given the low frequency of the SH as it applies to specularity...

6) Inferred Lighting evaluates the lighting at low frequency and upscales. Unfortunately, if you have very high frequency normal detail (we generally do), this is bad as this detail is mostly lost as their discontinuity filter only deals with identifying normals at the facet level, and not at the texture level. The suggested method isn't dependant on normals at all as lighting is accumulated independent of them, so it doesn't suffer from that problem.

7) You can start to do a lot of strange stuff with this. For example:
- want to calculate a simple GI approximation? Basically do the standard operating procedure Z based spherical search used in most SSAO solutions, except when a z-texel "passes", accumulate it's SH solution multiplied by its albedo and a transfer factor (to dampen things). Now you've basically got the surrounding lighting...
- want to handle large quantities of particles getting lit without doing a weird forward rendering pass that violates the elegance of your code? Do a 2nd Z-pass, this time picking the nearest values, and render the transparent stuff's Z into a new Z buffer. Now, regenerate the SH buffer using this second nearer Z-set into a new buffer set. You now effectively have a light volume, so when rendering the individual particles, simply lerp the two SH values at the given pixel based on the particle's Z (you could even do this at vertex level sampling the two SH-sets per-pixel seems cost prohibitive). Of course, this assumes you even care about the lighting being different at varying points in the volume, as you could just use the base set.
- if you rearrange the coefficients and place all the 0th coefficients together in one of the SH buffers you can LOD the lighting quality for distant objects by simply extracting that as a loose non-directional ambient factor for greatly simplified shading.
- you can rasterize baked prelighting directly into the solution if your prelighting is in the same or a transformable basis... assuming people still care about that.
- if you construct the SH volume, you could use it to evaluate scattering in some more interesting ways... You could also use this "SH volume" to do a pretty interesting faking of general volumetric lighting. If one were to get very adventurous, you could - instead of using min-z-distance as the top cap, simply use the near plane, and then potentially subdivide along Z if you wanted, writing the lighting into a thin volume texture.

So, the "bad":
- lots of data, as we need 12 coefficients per lit texel. That's a lot of read bandwidth, but really its not any more expensive than every pixel in our scene needing to read the Valve Radiosity Normal Map lighting basis, which we currently eat.
- Dealing with SH's is certainly confusing and complicated. For the most part this only involves adding SH's together which is pretty straightforward. But unfortunately converting lights into SH's is not free. The easiest thing to do is pre-evaluate a directional light basis and simply rotate it to the desired direction. Doable, given we're only dealing with 4 coefficients. Or, directly evaluate the directional light and construct its basis. Once you've a directional light working, you can use it to locally approximate point and spot lights by merely applying their attenuation equations. Of course, if you don't need any of the crazier stuff, you could just use a simpler basis (like the Valve one) where conversion is more straightforward.

Anyway, there we go. If anyone reads this, let me know what you think. It would seem this solution is superior to Inferred Lighting in its handling of lit alpha, as with their solution you can really only "peel" a small number of unique pixels due to the stippling, and the more you peel, the more degradation it causes in the scene to the lighting.

Anyway, for now until I can think of a better name, I'm calling it Immediate Lighting.

Sunday, May 17, 2009

Designing for 60Hz

So, rendering at 60Hz. For a fighting game. Now, if you do this stuff for a living, this may or may not sound easy or difficult to do (how's that for non-committal?). The truth is, certainly with my bias in mind, this stuff is difficult. I mean, as they say, if it was easy everyone would be doing it.

Okay, the rule with rendering is... know your scenario. If your game designers, artists and creative leads actually know what they're doing, a big part of their job is to define the box the game will need to fit into. If they can't define limitations and stick to them they are not doing their jobs. Simple. One nice thing of having a requirement by our genre to run at 60Hz means that, unlike I guess quite a few of my equivalents on other teams, I can actually push back quite a bit and call people out on their demands (ie, bullshit).

For example, density - how much crap do I need to deal with? Where can the camera go? How close does it get to stuff? Do I need to worry about LOD? How much in the scene is dynamic, or expected to be? Is this dynamic behaviour predictable? How much translucency is in the scene? What kind of effects are needed? And so on...

Anyhoo, like I said, the big deal here is to know the limits of what you need to deal with, because there is only so much time in which to do everything you need (and/or want) to do. This gives you basic requirements you need to work within so you can figure out what features you can support. And sadly, you quickly find out that at 60Hz... it ain't a whole lot.

For example, all sorts of things one would automatically take for granted need to be called into question. Do I do a Z-prepass? Is it worth the CPU overhead, the fillrate burn, the geometry processing... can I fake it out? Can I hand-sort the frame to avoid the Z-prepass work?

How do I light the scene? How complex does lighting need to be? Do I need to light everything? Just the characters? Are my characters normal mapped? If not, can I get away with vert lighting? Or maybe just texture projection? Can I get away with mixing and matching lighting quality? Can I defer lighting? Can I afford all the resolving that goes along with it to build the G-buffer? At the very least it's an extra pass that has a pretty high ROP cost...

And, oh boy, shadows. Shadows are a disaster. Pretty much there's no great solution that isn't stupidly expensive. Either your shadowmaps are too small, badly projected, undersampled, not cleanly enough PCF'd. Whatever. And that's for just basic passable shadows. Texture bandwidth disasters, is what shadows are. And god forbid you need to render multiple shadows. But more on that another time.

Consider something like SSAO. Expensive, hard to do really well so its artifacts aren't distracting. Let's say you do the work at a lower resolution, compromise quality a little, and you're able to get the cost down to... say, 3 ms. In a 30Hz game that's a significant amount of time, but hardly a deal breaker. In a 60Hz game... well, that's about a fifth of the frame (rounding up, obviously). That's a pretty big deal for something that's often pretty subtle, though obviously impactful. So unfortunately while it's really the subtle stuff that makes the scene look more realistic, it's generally the subtle stuff that ends up getting tossed out the window in favor of flashier and more obvious stuff.

Then there's one last really stupid problem - platform bias. For example, move too much of the rendering work from the RSX to SPU on PS3 and you could actually cause more problems than you solve. If you live in a world where all your artists test on 360 and pretty much "the chumps" on the port team are the only people who ever see PS3 running you can probably happily go nuts pushing work to SPU. If you live on a team that tries to be more disapplined about platform support (some of our artists currently only test on PS3, some only on 360, leads test on both) that can somewhat backfire. So you basically have to design solutions to problems that take both platforms' weaknesses into account from the start, rather than treat it as a porting problem afterwards. Certainly less fun, but much safer than the alternative. For example, an SPU-only solution to SSAO to get that 3 ms back probably doesn't really make sense unless the 360 is so far ahead performance-wise that you've got time to burn.

That's enough for tonight. I really wish I could talk about the specific stuff I'm playing with now, as it's pretty darn cool... but hey, I'll bounce more vagaries and continue to hint at things. :) Some more details another time.

Saturday, May 9, 2009

Oooh, a little more please.

Ah well. So as is typical, I don't think I'm allowed to talk about specifics of what I'm doing just yet, but needless to say, I am working on the next MK game. Can't talk about details, as I don't want to spoil whatever eventual PR we're doing.

That said, I can talk in loose vagaries. So, for example, I can say that tech for the game is progressing nicely. The really good thing about working on this game so far is that we're currently at that nice cruising point where you don't have to worry about getting basics working. I mean, there always is the option of scrapping and reworking existing systems... but it's not the same thing as having to build them up that first time. We have the benefit of having shipped a game forcing us to have built a whole series of tech to get that product done.

So... technically we could probably rest on our laurels. I mean, we shipped a fighting game already on this generation of hardware. It's a reasonable guess to assume the next MK is some kind of fighting game, so we probably have enough tech that if all we wanted to do was a content update, we probably could. But, really, where's the fun in that?

Well go figure, we're improving various systems. Upgrading some, rewriting others, adding some new ones where we didn't worry about certain kinds of features previously. After all, game development is really an arms race and we have to assume that in two years people will simply expect more. Me, I'm overhauling various parts of the rendering backend. Truth is, a lot of the talk I gave at GDC will no longer represent how things are done in our fighting engine by the time I'm done. But hey, that's the price of progress.

As time goes on, I hope/plan to post some insights and ideas related to what I do, which is primarily rendering work. So, stay tuned, as there's certainly more to come.

And so we begin?

Well, here we go. Time to maybe start a blog, post the occasional idea, maybe even get a little bit of feedback. So... first post!

So, this post will deal with something I should have dealt with 2 months ago - posting my GDC presentation from this March to the web somewhere that people can find it. So, here goes:

Hitting 60Hz in Unreal Engine

So, there we go. Two things done at once. First, established a blog - second, figured out how to embed a Powerpoint presentation inside said blog! Only problem is, I have absolutely no idea how to go about getting the linked video to work. Oh well, the pair of screenshots will have to do.

This should be a pretty close to final version of the presentation. It should match the presentation I gave other than maybe a couple minor corrections in wording, and the credit for Nate turning into a "thanks" in the final version.

I did get some comments afterwards about how it would have been nice to show more video/images showing a lot of this stuff in action. I'll see if I can scrounge stuff up at some point (although, the game's been on sale since early November 2008, so you can always just buy a copy. Heh).