Thursday, August 13, 2009

Has someone tried this before?

Okay, so I'll be describing a simple approach that I haven't seen being pushed around (I should admit, it bears some pretty obvious similarities to Peter Pike Sloan's et al's Image-Based Proxy Accumulation for Real-Time Soft Global Illumination, though I think my suggested method is a lot simpler. There's been a lot of talk about deferred lighting solutions to renderers. Most of these solutions have one big thing in common: they require a per-pixel screenspace normal map and specular exponent (or some other fancy BRDFish properties) to have been written out in advance. Not a huge limitation, but a real one nonetheless. So what follows removes that limitation:

a) do a standard Z-prepass
b) allocate a 3-deep MRT color buffer, init to black
c) now evaluate all light sources as one normally might for any standard deferred lighting solution, except we add to the color buffers view-space relative Spherical Harmonic representations of the light's irradiance relative to the current pixel.

It's implicit, but to make it explicit and state it outright, you're writing a very simple 3-color 4 coefficient SH into the buffer. Or, alternatively one might choose a hemispherical basis that needs fewer coefficients, but there are good reasons to stick with a spherical one (primarily that you can handle arbitrary reflection vectors).

So, why bother with this? Here are a few interesting "wins".
1) lighting is entirely decoupled from all other properties of the framebuffer - even local normals. Lighting can be evaluated separately when we get to shading using a lighting model that changes relative to the shading.

2) since lighting is typically low frequency in scenes, you can almost certainly get away with constructing the buffer at a resolution lower than the real frame buffer. In fact, since the lighting is independent of things like normal discontinuities, you might even be able to get away with ignoring edge discontinuities. Or you could probably work around this using an ID buffer constructed later, similar to how Inferred lighting works to multisample the SH triplet (thought that seems pretty expensive) and choose the best sample.

3) To me this is really the biggest win of all - because this is decoupled from the need for a G-buffer or any actual properties at the pixel other than its location in space, you could really start doing this work immediately after a prepass has begun! This is becoming more and more important going forward, as getting more and more stuff independent means its easier to break things into separate jobs and distribute across processing elements. In this case you could actually subdivide the work and have the GPU and CPU/SPU split the work in a fairly simple way, and its almost the perfect SPU-type task as you don't need any underlying data from the source pixel other than Z.

4) MSAA can be handled in any number of ways, but at the very least, you can deal with it the same way Inferred Lighting does.

5) There's no reason for specularity to suffer from the typical Light-Prepass problem of color corruption by using diffuse color multiplied by spec intensity to fake specular color. Instead you could just evaluate the SH with the reflection vector. Of course, one does need to consider that given the low frequency of the SH as it applies to specularity...

6) Inferred Lighting evaluates the lighting at low frequency and upscales. Unfortunately, if you have very high frequency normal detail (we generally do), this is bad as this detail is mostly lost as their discontinuity filter only deals with identifying normals at the facet level, and not at the texture level. The suggested method isn't dependant on normals at all as lighting is accumulated independent of them, so it doesn't suffer from that problem.

7) You can start to do a lot of strange stuff with this. For example:
- want to calculate a simple GI approximation? Basically do the standard operating procedure Z based spherical search used in most SSAO solutions, except when a z-texel "passes", accumulate it's SH solution multiplied by its albedo and a transfer factor (to dampen things). Now you've basically got the surrounding lighting...
- want to handle large quantities of particles getting lit without doing a weird forward rendering pass that violates the elegance of your code? Do a 2nd Z-pass, this time picking the nearest values, and render the transparent stuff's Z into a new Z buffer. Now, regenerate the SH buffer using this second nearer Z-set into a new buffer set. You now effectively have a light volume, so when rendering the individual particles, simply lerp the two SH values at the given pixel based on the particle's Z (you could even do this at vertex level sampling the two SH-sets per-pixel seems cost prohibitive). Of course, this assumes you even care about the lighting being different at varying points in the volume, as you could just use the base set.
- if you rearrange the coefficients and place all the 0th coefficients together in one of the SH buffers you can LOD the lighting quality for distant objects by simply extracting that as a loose non-directional ambient factor for greatly simplified shading.
- you can rasterize baked prelighting directly into the solution if your prelighting is in the same or a transformable basis... assuming people still care about that.
- if you construct the SH volume, you could use it to evaluate scattering in some more interesting ways... You could also use this "SH volume" to do a pretty interesting faking of general volumetric lighting. If one were to get very adventurous, you could - instead of using min-z-distance as the top cap, simply use the near plane, and then potentially subdivide along Z if you wanted, writing the lighting into a thin volume texture.

So, the "bad":
- lots of data, as we need 12 coefficients per lit texel. That's a lot of read bandwidth, but really its not any more expensive than every pixel in our scene needing to read the Valve Radiosity Normal Map lighting basis, which we currently eat.
- Dealing with SH's is certainly confusing and complicated. For the most part this only involves adding SH's together which is pretty straightforward. But unfortunately converting lights into SH's is not free. The easiest thing to do is pre-evaluate a directional light basis and simply rotate it to the desired direction. Doable, given we're only dealing with 4 coefficients. Or, directly evaluate the directional light and construct its basis. Once you've a directional light working, you can use it to locally approximate point and spot lights by merely applying their attenuation equations. Of course, if you don't need any of the crazier stuff, you could just use a simpler basis (like the Valve one) where conversion is more straightforward.

Anyway, there we go. If anyone reads this, let me know what you think. It would seem this solution is superior to Inferred Lighting in its handling of lit alpha, as with their solution you can really only "peel" a small number of unique pixels due to the stippling, and the more you peel, the more degradation it causes in the scene to the lighting.

Anyway, for now until I can think of a better name, I'm calling it Immediate Lighting.

6 comments:

  1. Interesting idea. BTW, what about using a cube map lookup on light direction to construct the SH? Might be faster than either performing the rotation or constructing the basis.

    In some ways this is similar to what Crytek is doing but instead of creating a full 3D volume texture of SH, you're creating an "adaptive slice" between the frontmost and backmost lit pixels and lerping inbetween. I imagine if you have a big depth range lerping between front and back may not give you results you really want for lit translucency in the middle (I'm imagining a shooter with stuff potentially very far away, and two effects inbetween).

    You might be able to borrow a technique from Crytek and produce glossy reflections of the translucency on your opaque stuff -- they essentially do a bound ray march in their 3d SH volume accumulating irradiance along a direction. You could do a single ray along a direction with your scheme and achieve an approximation of that effect.

    Another question I'd have is if 8 bits per component is going to be enough to encode the dynamic range (probably depends on your expected scene).

    I think the tradeoff here is even with a smaller lighting buffer, lights are probably more expensive to evaluate than traditional deferred lighting (just my gut feeling). But you're saving a lot of geometry costs due to a full-speed depth pass, and you get full resolution lighting for transparency. So fixed overhead is probably smaller.

    ReplyDelete
  2. Good suggestion - cubemaps would work great running on the GPU to accelerate things, as you could embed the whole 4-coefficients and avoid the rotation. Unfortunately, I don't think they'd necessarily translate well to the SPU as memory's a huge concern there but meanwhile computation is cheap as dirt. Since one of the greater strengths of the technique is how easily you can push the work to SPU (while the GPU starts on something else or takes on a piece of the work) that would seem a problematic choice there. Six of one, half dozen etc.
    You're right about the two layers quite possibly not being enough for general transparency. Still, you could actually cheat that in a number of ways:
    a) if you assume the particles are clustered in depth, calc a max/min specific to the particle cluster and lerp across that.
    b) if they're more distributed, cut Z into more regular slices. Truth is, you can probably get away with VERY low res lighting for the translucency comparitively, so you can probably trade height/width for depth.
    If I understood correctly, they're only rendering the VPLs into the radiant cube and not directly evaluating the direct lighting in it. As well, Crytek's solution is really tailored towards small areas (they mention in the presentation it's really meant for smaller indoor regions) otherwise they'd certainly suffer from a similar problem, as their cube is only 32x32x32.
    On to the precision question... well, 8 bpc is pretty low, and you'd probably need some kind of per component scale factor (basically some scaling constants) to get a decent range. But, realistically, the Radiosity Normal Maps we presently use are all only 8bpc themselves with a single scalar scale factor. So I think the inaccuracies can be survived and shouldn't be too problematic. I'm pretty sure there are talks out there on the web actually going over the best way to quantize each of the first 4 SH bands. Regardless, I suppose if you wanted to get really fancy you could build a simple scale map.
    And lastly, performance. You know me... performance is always my prime concern with all this stuff. I think the real performance win is that the buffer can be quite lower res than the actual frame buffer (or a comparable deferred light buffer), so despite it's extra weight and the additional computation I think it trades off quite well. Compare the typical resolution of lightmaps to the actual local resolution of the frame and you'll see what I mean. It wouldn't surprise me if you could actually trivially drop the SH buffer to a quarter in each dimension of the real frame. Compare that with a typical Prelight Pass lighting solution where you actually have to light at *double* res to account for MSAA.... I think it's quite possible the SH solution ends up cheaper overall - of course there's really only one way to find out...

    ReplyDelete
  3. Yeah, I was thinking of using a cubemap on platforms where there is no SPU and it probably would still be faster to use the GPU to render the irradiance buffer.

    Something else you could do if you're going to end up using a discontinuity-aware filter anyway is borrow from inferred lighting and for your front slice, render your layers of transparency stippled but still in a separate buffer from the opaque irradiance. As you point out, translucency probably could due without as high frequency of lighting anyway. This would allow you to get 5 depth slices per-lighting element (lixel?).

    As far as Crytek's solution goes, its local in the sense that cascaded shadow maps are local (so there is some cutoff distance, and results are even lower frequency closer to that distance). Which is still a pretty decent ways out from the camera. They also do throw some direct lighting in there, mainly as a LOD technique (I think it could be either artist-controlled or based on heuristics of light size vs cell size). I've mostly been interesting in LPVs for indirect lighting, but not from a dynamically computed source, but from a precomputed irradiance volume. This would offer some interesting opportunities because you could prefilter this static data at different resolutions ahead of time, and then its a simple resampling problem to generate the volume textures (or to populate the irradiance buffer you propose with indirect lighting).

    Interestingly enough, if you do the math assuming you can get away with quarter-res lighting (assuming a 1280x720 render resolution and a 320x180 irradiance buffer size)

    320x180x4 channels (rgb sh + scaling coefficient for 3 channels) x 2 buffers (one for opaque, one for stippled translucency) = 1.8M of bandwidth (assuming 4 byte render targets)

    Crytek's LPVs with 6 cascades. 32x32x32x4 channels x 6 cascades = 3 M of bandwidth

    I think most of the videos they have use 2 cascades, but I guess we're not talking unreasonable amounts of bandwidth here.

    Is your plan to render the first depth pass at the lower resolution and then just toss that depth buffer for the "true" geometry pass at full res? Seems simpler than attempting to downsample.

    ReplyDelete
  4. If you just went with straight up 16 bit floating point buffers (so just 3 rgb sh), you're still under the bandwidth of the Crytek solution (at 2.7M).

    ReplyDelete
  5. Good numbers, and interesting that it might prove cheaper in the long run. As far as downsampling Z vs merely rerendering... I suppose that depends on density of geometry. Let's face it, a first step of downsampling Z probably isn't really that expensive (though admittedly the contents are a bit dubious). I suppose some knowledge of the scene is required to choose.

    The only real problem I see with the stippling method is that it really depends on only a small number of overdrawn samples at the particular pixel as it's first 4 in claim the 2x2 square. Either more regular division in Z or some kind of clustering-type method to put the detail where you want it would seem more desirable.

    ReplyDelete
  6. I was able to do a quick and dirty implementation of this -- here are the results.

    ReplyDelete