Bounding Volume Hierarchy (BVH) trees are ubiquitous in modern ray tracing applications and help accelerate the search for ray-triangle intersections in a scene by organizing geometry into progressively tighter axis-aligned bounding boxes.

*This document is based on the survey on bounding volume hierarchies for ray tracing ^{1}.*

The Ultimate Guide to Bounding Volume HierarchiesConstructionThe Surface Area Heuristic Other HeuristicsRDH - Ray Distribution HeuristicSolid Angle ProjectionOSAH - Occlusion Heuristic EPO - End Point OverlapLCV - Leaf Count VariabilityCWBVH - Compressed Wide BVHAlgorithmsTop-Down ConstructionSplitTerminateBottom-Up ConstructionSome Commonly Used AlgorithmsBBVH - Binned BVHSBVH - Spatial BVHLBVH - Linear BVHHLBVH - Hierarchical Linear BVHAAC - Approximate Agglomerative ClusteringTRBVH - Treelet Restructuring BVHPLOC - Parallel Locally-Ordered ClusteringPRBVH - Parallel Reinsertion BVHPerformanceOther BVHsMulti-Level HierarchiesWide BVHBVH LayoutTree LayoutCompressionWhy BVH?

The optimal acceleration structure for ray tracing is *believed* to be an NP-hard problem^{2}, so construction relies on heuristics and greedy algorithms. Since the goal of using an acceleration structure is to accelerate ray traversal, the ultimate goal is to reduce rendering time. Heuristics attempt to calculate the relative rendering time using different BVH structures.

The SAH assumes a few properties:

- All rays intersect the root node bounding box
- Ray directions are uniformly distributed
- None of the rays intersect any objects (begin outside of the BVH and are infinitely long)

The probability of an intersection is approximated by the surface area of the node. The larger the surface area, the more likely a random infinite ray will intersect the node. Each intersection is associated with a cost and the cost is minimized by bounding the geometry using the minimal surface area.

The expected cost of traversing a BVH tree is the sum of the costs of each interaction (i.e. ray-box, ray-triangle), weighted by their surface area relative to the total size of the root node.

Although SAH is the most common heuristic because it applies to the most general case, there are also other heuristics to optimize for specific cases.

The assumptions made with SAH that rays are infinitely long is not true and the cost can change drastically with the average length of rays. RDH considers real ray distribution by sampling a small set of rays and tracking the number of hits on each node.

Instead of using the proportional surface area, use the proportional number of hits at each node.

The first SAH asssumption that requires rays to originate outside the root node bounding box is very unrealistic for secondary rays. Secondary rays originate from an intersection point in the scene and thus always originate inside the root node bounding volume.

The probability of a ray hitting node

It may be useful to consider occlusion and proximity of geometry to ray in cost model when building BVH. Geometry marked as occluded is preferred to be built into a different subtree. While they can still be intersected, the traversal cost is higher for previously occluded geometry in this heuristic.

Occlusion information can be gathered from previous frames in an animation or by sampling a representative set of rays.

Overlapping nodes lower performance because a ray must traverse multiple subtrees to find the closest hit.

In this metric,

In SIMD execution, warps remain together in each part of the while loop so variances in leaf nodes delay the inner node loop.

This measures the standard deviation of the number of leaf nodes

Special cost functions are used to convert a binary BVH to wide BVH trees. One case of a wide BVH tree is the compressed wide BVH tree that has up to eight children per node.

When appropriate, nodes are either converted to one of eight child nodes or an inner node with a distributed forest with

In general, BVH build algorithms can be categorized either as **bottom-up** or **top-down**, each with their own advantages. Starting from the top helps focus optimizations at the top levels of the tree, which may be more critical to traversal performance. However, starting from the bottom optimizes at a finer granularity and usually generates BVH trees with a lower global cost.

`xxxxxxxxxx`

`push root onto stack`

`while stack is not empty:`

` pop node from stack`

` if terminate then`

` make node leaf`

` else`

` split primitives into children`

` assign children as child nodes`

` foreach child in node:`

` push child onto stack`

The main aspects of top-down construction is defined by the `terminate`

function and the `split`

function.

The three most common approaches to splitting a given node are as follows:

**Spatial median**: cut bounding box in the middle (choosing an axis)**Object median**: organize scene primitives and split into two halves (roughly equal number on each side)**Cost model**: use an heuristic to determine "best" cut

Using a cost model is slower but generates better quality BVH trees than using spatial or object splits. However, the SAH cannot be calculated until the entire tree is complete. From a top-down approach, costs must be approximated.

The approximation of the SAH treats unbuilt subtrees as leaf nodes and calculates the cost as

To avoid calculating the cost of every possible split, which can be especially difficult at the top of the tree, primitives are often *binned* into regions so that only a subset of all possible splitting planes are evaluated.

In the simplest method, the top-down approach terminates when the approximated cost is higher than the cost of a leaf node.

Bottom-up builds generally start by clustering primitives. In each iteration, candidate clusters are merged together until all primitives have been merged into a single structure.

`xxxxxxxxxx`

`while clusters > 1:`

` identify nearest clusters`

` assign clusters as child nodes`

` merge into parent cluster`

The lowest cost is created by merging the closest pairs in each iteration. However, determining the closest cluster pair may be too time consuming and approximations can be made using morton curves.

Here is an overview of the most commonly used BVH building algorithms. BVH trees can also be further optimized with insertion-based methods or refitted for dynamic scenes.

👍 : Good quality, improves on Kd trees

❌ : Slow

- Create
bins and track the number of primitives$K$ and the bounding box extents.${n}_{i}$ - Assign bin IDs to primitives using a coarse-grained function on centroid locations and add the primitive to the bins.
- Recursively continue until the number of primitives reaches a threshold or the bounding box becomes too small.

👍 : Very high quality

❌ : Slow build, more primitive references

- Calculate a object-split candidate as in BBVH
- If the object-split overlap between the two child nodes is large (i.e.
), also calculate a spatial-split candidate. The spatial split can result in several configurations for clipped primitives and the surface area cost can be assessed for each option.$\frac{SA({n}_{1}\cap {n}_{2})}{SA(N)}>\alpha $ - Calculate the SAH cost for each candidate (object and spatial) and proceed with best option.
- Recurse until complete.

👍 : Very fast, parallel sort can be parallelized

❌ : Does not optimize for SAH

- Sort all scene primitves by Morton code.
- Build BVH tree by splitting primitives when a digit in the Morton code changes (i.e. from 0 to 1).
- Post-process from bottom-up to group "singleton" nodes.
- Calculate appropriate bounding box for each node.

* Hybrid*: A hybrid version of LBVH + SAH can be created by using LBVH for initial splits, then SAH when there is enough splits to efficiently parallelize.

👍 : Very fast, avoids singleton nodes

❌ : Lower quality than SAH, may perform worse than LBVH in some scenes

- Sort all scene primitives by significant bits of Morton code (by compress-sort-decompress method).
- Sort by remaining bits of Morton code.
- Use Morton codes to form treelets to avoid creating "singleton" nodes.
- Calculate appropriate bounding box for each node.

* Hybrid*: A hybrid version of SAH + HLBVH can be created by using coarse significant bits of Morton code as bins for SAH, then using HLBVH for the remaining bits.

👍 : Bottom-up approach, high quality, can be parallelized

❌ : Requires large stack state

- Sort all scene primitives by Morton code of bounding box center.
- Build constraint tree from the top down by bisecting primitives at each level until the number of primitives in each subtree is smaller than a threshold
.$\delta $ - Combine primitives within each subtree by finding the closest neighbours until there are
clusters (based on relative bounding box size). Operations in each subtree can be executed in parallel.$n$ - Calculate SAH and determine if subtree should be collapsed back into leaf node.
- Merge with another subtree and continue combining clusters until there are
clusters.$n$ - Terminate when the subtree holds a single cluster.

👍 : Generates high quality trees, parallel algorithm

❌ : Requires an initial BVH, localized optimizations

- Build an initial BVH using a fast method such as LBVH.
- Perform bottom-up traversal. For each node, create a treelet of its descendants and optimize the treelet using SAH. This step can be executed in parallel.
- Update BVH and continue traversal upwards.

👍 : Targets GPUs, high quality trees, fully bottom-up approach

❌ : Less effective on smaller scenes

- Sort all scene primitives by Morton code of bounding box center.
- Search for nearest neighbour for each primitive within a set distance
(only search locally).$r$ - Merge clusters that are mutually the nearest neighbour. There should be at least 1 merge in each iteration.
- Use parallel prefix scan to identify invalid merged clusters and remove gaps along the Morton curve as clusters are merged.
- Terminate when all clusters have been merged into a single tree.
- Each leaf node only holds a single primitive. Refine tree using a bottom-up approach to calculate SAH costs and collapse subtrees.

👍 : Optimizes global cost, parallel algorithm

❌ : Requires an initial BVH

- Build an initial BVH using a fast method such as LBVH.
- In parallel, calculate the SAH cost changes of removing a leaf node and re-inserting it elsewhere in the tree. Traverse up, then down, merging and adjusting the bounding volume to find the most optimal location.
- Terminate when the improvement to the overall SAH cost is less than some
.$\u03f5$

Some works evaluate these BVH build algorithms for their actual ray tracing performance versus their SAH cost. Hippie includes most implementations.

*On Quality Metrics of Bounding Volume Hierarchies*^{6}*Performance Comparison of BVHs for GPU Ray Tracing*^{16}

To date, the best ground-up BVH builder is SBVH. However, additional modifications such as PRBVH can improve SBVH further.

Structures beyond the basic binary BVH can be efficient for specific situations such as dynamic geometry or SIMD traversal.

To support dynamic scene geometry and accelerate BVH builds for large scenes, modern BVH trees are commonly organized into a two level hierarchy, featuring a *top-level acceleration structure (TLAS)* and many *bottom-level acceleration structures (BLAS)*.

BLAS structures are regular BVH trees holding the geometry primitives. However, one BLAS object may be instanced many times in the scene. The TLAS structure builds a BVH tree over instances of BLAS objects. The leaf nodes of a TLAS include a transformtion from world-space into the instance-space and a pointer to the appropriate BLAS.

Some APIs offer more than two levels of division, or the option of a * programmable instance* with

SIMD ray traversals can benefit from wide BVH trees by performing ray intersection tests in parallel at each step. However, there is still poor SIMD efficiency when the entire group must traverse a subtree only few threads hit. There are also more empty slots in wide BVH trees that limit efficiency.

Wide BVH trees can either be collapsed from a binary BVH tree or built on their own.

Beyond the construction of BVH structures and the associated performance metrics, another key component is the memory layout. The way BVH nodes are stored in memory significantly affects the traversal performance. There are optimizations to promote cache hit rates, to take advantage of locality, and to reduce memory footprint.

A BVH node needs to store at least the bounding volume (represented by a minimum and maximum extent in 3D space) and the location of its children (can be implicit, indexed, addressed, etc.).

BVH trees are usually mapped to memory in relation to the traversal patterns. The most basic forms are stored in memory in * Depth-First Order* and

Data size of BVH trees are greatly affected by bounding boxes and child pointers. The bounding boxes can easily be expressed relative to its parent for a reduced storage size. These values are also often quantized in a conservative fashion. In addition, child pointers are often stored as offsets. Several works explore different ways to compress this data:

- RACBVH - Random-Accessible Compressed BVH
^{21} - HMQ - Hierarchical Mesh Quantization
^{22} - CWBVH - Compressed Wide BVH
^{7}

Although BVH trees are the default standard in ray tracing applications today, ray tracing did not always rely on BVHs. Earlier alternative spatial structures to accelerate ray traversal include the following:

Acceleration Structure | Details |
---|---|

Uniform Grid | - Bounding boxes are subdivided in 3 dimensions (uniform or non-uniform) - Number of voxels is derived from number of objects in scene |

Binary Space Partitioning (BSP) | - Binary space partitioning - Each node is split equality into two child boxes of the same size |

Kd-Tree | - Splitting planes are placed based on cost function - Each node is split into two child boxes by splitting plane |

Octree | - Each box is subdivided into eight child boxes of the same size |

Grid Hierarchy | - Each cell of the uniform grid is subdivided with other uniform grid |

Some of the reasons why BVH prevailed are as follows:

- The memory complexity is bounded based on the number of primitives in the scene. Given a BVH structure, the memory usage can be easily computed from the scene primitives. For example, in a binary BVH tree, the BVH contains
*at most* nodes.$2n-1$ - BVH trees divide space based on the scene contents, allowing for small detailed clusters and empty spaces. For example, placing a detailed teapot in a large, empty stadium can be problematic for many spatial division techniques, but BVH trees allow efficient pruning to avoid testing with empty sections of space.
- Bounding volumes can easily be refitted for dynamic scenes. BVH refitting is a fast alternative to rebuilding the entire structure when there are small geometry changes. The changes are often small due to temporal locality.
- There is a large variety of different BVH build algorithms, creating a range of trade-offs between construction speed and tree quality. Applications can choose the best balance of speed versus quality for their own purposes (i.e. real-time games, CAD renderings, production-quality films, etc.).

A full comparison of BVH versus Kd-tree is evaluated in *Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing*^{23}

1 D. Meister, S. Ogaki, C. Benthin, M. J. Doyle, M. Guthe, and J. Bittner, “A survey on bounding volume hierarchies for ray tracing,” in *Computer Graphics Forum*, 2021, vol. 40, pp. 683–712. ↩

2 V. Havran, “Heuristic ray shooting algorithms,” PhD Thesis, Czech Technical University in Prague, 2000. ↩

3 J. Bittner and V. Havran, “RDH: ray distribution heuristics for construction of spatial data structures,” in *Proceedings of Spring Conference on Computer Graphics (SCCG)*, 2009, pp. 51–58. ↩

4 B. Fabianowski, C. Fowler, and J. Dingliana, “A Cost Metric for Scene-Interior Ray Origins.,” in *Eurographics (Short Papers)*, 2009, pp. 49–52. ↩

5 M. Vinkler, V. Havran, and J. Sochor, “Visibility driven BVH build up algorithm for ray tracing,” *Computers & Graphics*, vol. 36, no. 4, pp. 283–296, 2012. ↩

6 T. Aila, T. Karras, and S. Laine, “On quality metrics of bounding volume hierarchies,” in *Proceedings of High Performance Graphics (HPG)*, 2013, pp. 101–107. ↩ ↩ ↩

7 H. Ylitie, T. Karras, and S. Laine, “Efficient incoherent ray traversal on GPUs through compressed wide BVHs,” in *Proceedings of High Performance Graphics (HPG)*, 2017, pp. 1–13. ↩ ↩

8 I. Wald, “On fast construction of SAH-based bounding volume hierarchies,” in *IEEE Symposium on Interactive Ray Tracing*, 2007, pp. 33–40. ↩

9 M. Stich, H. Friedrich, and A. Dietrich, “Spatial splits in bounding volume hierarchies,” in *Proceedings of High Performance Graphics (HPG)*, 2009, pp. 7–13. ↩

10 C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha, “Fast BVH construction on GPUs,” in *Computer Graphics Forum*, 2009, vol. 28, pp. 375–384. ↩

11 J. Pantaleoni and D. Luebke, “HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry,” in *Proceedings of High Performance Graphics (HPG)*, 2010, pp. 87–95. ↩

12 Y. Gu, Y. He, K. Fatahalian, and G. Blelloch, “Efficient BVH construction via approximate agglomerative clustering,” in *Proceedings of High Performance Graphics (HPG)*, 2013, pp. 81–88. ↩

13 T. Karras and T. Aila, “Fast parallel construction of high-quality bounding volume hierarchies,” in *Proceedings of High Performance Graphics (HPG)*, 2013, pp. 89–99. ↩

14 D. Meister and J. Bittner, “Parallel locally-ordered clustering for bounding volume hierarchy construction,” *IEEE Transactions on Visualization and Computer Graphics (TVCG)*, vol. 24, no. 3, pp. 1345–1353, 2017. ↩

15 D. Meister and J. Bittner, “Parallel reinsertion for bounding volume hierarchy optimization,” in *Computer Graphics Forum*, 2018, vol. 37, no. 2, pp. 463–473. ↩

16 D. Meister and J. Bittner, “Performance Comparison of Bounding Volume Hierarchies for GPU Ray Tracing,” *Journal of Computer Graphics Techniques (JCGT)*, vol. 11, no. 4. ↩

17 W.-J. Lee, G. Liktor, and K. Vaidyanathan, “Flexible Ray Traversal with an Extended Programming Model,” in *International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH Asia)*, 2019, pp. 17–20. ↩

18 C. Benthin, S. Woop, I. Wald, and A. T. Áfra, “Improved two-level BVHs using partial re-braiding,” in *Proceedings of High Performance Graphics (HPG)*, 2017. ↩

19 S.-E. Yoon and D. Manocha, “Cache-efficient layouts of bounding volume hierarchies,” in *Computer Graphics Forum*, 2006, vol. 25, no. 3, pp. 507–516. ↩

20 D. Wodniok, A. Schulz, S. Widmer, and M. Goesele, “Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays.,” in *EGPGV@ Eurographics*, 2013, pp. 57–64. ↩

21 T.-J. Kim, B. Moon, D. Kim, and S.-E. Yoon, “RACBVHs: Random-accessible compressed bounding volume hierarchies,” *IEEE Transactions on Visualization and Computer Graphics (TVCG)*, vol. 16, no. 2, pp. 273–286, 2009. ↩

22 B. Segovia and M. Ernst, “Memory efficient ray tracing with hierarchical mesh quantization,” in *Proceedings of Graphics Interface*, 2010, pp. 153–160. ↩

23 M. Vinkler, V. Havran, and J. Bittner, “Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing,” in *Computer Graphics Forum*, 2016, vol. 35, no. 8, pp. 68–79. ↩