Parallel Reduction of Voxel Coordinates


[See previous devlog entry: Global illumination pre-req visualizations]

I've implemented parallel reduction of voxel world space coordinates in a compute shader and unfortunately performance isn't anywhere near acceptable.  *sigh*  It was an interesting experiment none the less and may still be viable if I can find a way to speed up the algorithm significantly.  I may have to try an entirely different approach for per voxel world space coordinate reduction (lighting pre-pass)...

Here's a visualization animation (much much slower than actual speed of about 0.005 seconds total) of parallel reduction of voxel world space coordinates (color coded) generated from compute shader output passes:

P.S: Here are the timing results for compute passes on a 256x256 vbuffer:

span-size   time
        2   0.000293
        4   0.000388
        8   0.000317
       16   0.000249
       32   0.000171
       64   0.000262
      128   0.000198
      256   0.000323
      512   0.000181
     1024   0.000165
     2048   0.000165
     4096   0.000177
     8192   0.000326
    16384   0.000480
    32768   0.000916
    65536   0.000721
total:      0.005332

Comments

Log in with itch.io to leave a comment.

Hmmm, fascinating. whats the goal here again?

(4 edits)

It's a many (pixels) to fewer (voxels) reduction problem in order to minimize computation in lighting passes.  The ray march pass determines which on screen pixels map to which voxels.  The goal is to reduce this to a unique set or list of voxel world space coordinates prior to the lighting passes.  The lighting passes would then do direct sunlight ray march and cone tracing for each voxel (from world space coordinate) rather than multiple times from each pixel for each voxel (from world space coordinate).  Hopefully my explanation makes sense.

I know this kind of thing has been done before, but it's hard to find any information online how to do so efficiently. 

I'm going to try a parallel reduction by building binary search trees (with key being voxel world space coordinates) in parallel (one for each row?) in a compute shader from ray march pass data (vbuffer) and then merging those trees into one tree to see if that is a faster method. [Video (linked) has given me ideas of perhaps a better way to go about this whole problem.]

This is relevant:

Oh yeah I know what you're talking about. I believe I actually talked about this idea in these comments awhile back :D