The Ultimate Guide to Bounding Volume Hierarchies

Bounding Volume Hierarchy (BVH) trees are ubiquitous in modern ray tracing applications and help accelerate the search for ray-triangle intersections in a scene by organizing geometry into progressively tighter axis-aligned bounding boxes.

This document is based on the survey on bounding volume hierarchies for ray tracing 1.

Construction

The optimal acceleration structure for ray tracing is believed to be an NP-hard problem2, so construction relies on heuristics and greedy algorithms. Since the goal of using an acceleration structure is to accelerate ray traversal, the ultimate goal is to reduce rendering time. Heuristics attempt to calculate the relative rendering time using different BVH structures.

The Surface Area Heuristic

The SAH assumes a few properties:

• All rays intersect the root node bounding box
• Ray directions are uniformly distributed
• None of the rays intersect any objects (begin outside of the BVH and are infinitely long)
$\begin{array}{}\text{(1)}& SAH:=\frac{1}{SA\left(N\right)}\left({C}_{inner}\sum _{nϵI}SA\left(n\right)+{C}_{tri}\sum _{nϵL}{T}_{n}SA\left(n\right)\right)=\sum _{nϵN}{C}_{n}\frac{SA\left(n\right)}{SA\left(N\right)}\end{array}$

The probability of an intersection is approximated by the surface area of the node. The larger the surface area, the more likely a random infinite ray will intersect the node. Each intersection is associated with a cost and the cost is minimized by bounding the geometry using the minimal surface area.

The expected cost of traversing a BVH tree is the sum of the costs of each interaction (i.e. ray-box, ray-triangle), weighted by their surface area relative to the total size of the root node.

Other Heuristics

Although SAH is the most common heuristic because it applies to the most general case, there are also other heuristics to optimize for specific cases.

RDH - Ray Distribution Heuristic3

The assumptions made with SAH that rays are infinitely long is not true and the cost can change drastically with the average length of rays. RDH considers real ray distribution by sampling a small set of rays and tracking the number of hits on each node.

$\begin{array}{}\text{(2)}& RDH:=\sum _{nϵN}{C}_{n}\frac{Hits\left(n\right)}{Hits\left(N\right)}\end{array}$

Instead of using the proportional surface area, use the proportional number of hits at each node.

Solid Angle Projection4

The first SAH asssumption that requires rays to originate outside the root node bounding box is very unrealistic for secondary rays. Secondary rays originate from an intersection point in the scene and thus always originate inside the root node bounding volume.

$\begin{array}{}\text{(3)}& SAP:=\sum _{nϵN}{C}_{n}\left(\frac{V\left(n\right)}{V\left(N\right)}+\frac{1}{V\left(S\right)}\sum _{i}{V}^{\prime }\left({R}_{i}\right)\frac{SA\left(\left({n}_{i}\right)}{SA\left({R}_{i}^{\prime }\right)}\right)\end{array}$

The probability of a ray hitting node $n$$n$ is approximated as the probability that it originates from node $n$$n$ in addition to the probability of it originating from somewhere in the root node bounding volume, then hitting node $n$$n$. The second part of the equation accounts for the probability of the ray from each region in a subdivided space and weighted by the angle to node $n$$n$ for likelihood of hitting the node.

OSAH - Occlusion Heuristic 5

It may be useful to consider occlusion and proximity of geometry to ray in cost model when building BVH. Geometry marked as occluded is preferred to be built into a different subtree. While they can still be intersected, the traversal cost is higher for previously occluded geometry in this heuristic.

$\begin{array}{}\text{(4)}& OH:=\sum _{nϵN}{C}_{n}\frac{O\left(n\right)}{O\left(N\right)}\end{array}$

Occlusion information can be gathered from previous frames in an animation or by sampling a representative set of rays.

EPO - End Point Overlap6

Overlapping nodes lower performance because a ray must traverse multiple subtrees to find the closest hit.

$\begin{array}{}\text{(5)}& EPO:=\sum _{nϵN}{C}_{n}\frac{SA\left(\left(N\setminus Q\left(n\right)\right)\cap n\right)}{SA\left(N\right)}\end{array}$

In this metric, $Q\left(n\right)$$Q(n)$ is the set of surface belonging to subtree $n$$n$. This sums the cost of all overlapping nodes using the surface area of all nodes that do not belong to the current subtree but falls in its volume.

LCV - Leaf Count Variability6

In SIMD execution, warps remain together in each part of the while loop so variances in leaf nodes delay the inner node loop.

$\begin{array}{}\text{(6)}& LCV:=\sqrt{E\left[{N}_{l}^{2}\right]-E\left[{N}_{l}{\right]}^{2}}\end{array}$

This measures the standard deviation of the number of leaf nodes ${N}_{l}$$N_l$ intersected by a ray.

CWBVH - Compressed Wide BVH7

Special cost functions are used to convert a binary BVH to wide BVH trees. One case of a wide BVH tree is the compressed wide BVH tree that has up to eight children per node.

When appropriate, nodes are either converted to one of eight child nodes or an inner node with a distributed forest with ${C}_{distribute}$$C_{distribute}$ as the minimum SAH cost.

Algorithms

In general, BVH build algorithms can be categorized either as bottom-up or top-down, each with their own advantages. Starting from the top helps focus optimizations at the top levels of the tree, which may be more critical to traversal performance. However, starting from the bottom optimizes at a finer granularity and usually generates BVH trees with a lower global cost.

Top-Down Construction

xxxxxxxxxxpush root onto stackwhile stack is not empty:  pop node from stack  if terminate then    make node leaf  else    split primitives into children    assign children as child nodes    foreach child in node:      push child onto stack

The main aspects of top-down construction is defined by the terminate function and the split function.

Split

The three most common approaches to splitting a given node are as follows:

• Spatial median: cut bounding box in the middle (choosing an axis)
• Object median: organize scene primitives and split into two halves (roughly equal number on each side)
• Cost model: use an heuristic to determine "best" cut

Using a cost model is slower but generates better quality BVH trees than using spatial or object splits. However, the SAH cannot be calculated until the entire tree is complete. From a top-down approach, costs must be approximated.

$\begin{array}{}\text{(8)}& C\left(n\right)\approx {C}_{trav}+{C}_{int}\sum _{n}\frac{SA\left(n\right)}{SA\left(N\right)}|n|=\stackrel{^}{c}\left(n\right)\end{array}$

The approximation of the SAH treats unbuilt subtrees as leaf nodes and calculates the cost as ${C}_{int}$$C_{int}$ multiplied by $|n|$$|n|$ number of primitives in the node.

To avoid calculating the cost of every possible split, which can be especially difficult at the top of the tree, primitives are often binned into regions so that only a subset of all possible splitting planes are evaluated.

Terminate

In the simplest method, the top-down approach terminates when the approximated cost is higher than the cost of a leaf node.

$\begin{array}{}\text{(9)}& {C}_{int}|n|\le \stackrel{^}{c}\left(n\right)\end{array}$

Bottom-Up Construction

Bottom-up builds generally start by clustering primitives. In each iteration, candidate clusters are merged together until all primitives have been merged into a single structure.

xxxxxxxxxxwhile clusters > 1:  identify nearest clusters  assign clusters as child nodes  merge into parent cluster

The lowest cost is created by merging the closest pairs in each iteration. However, determining the closest cluster pair may be too time consuming and approximations can be made using morton curves.

Some Commonly Used Algorithms

Here is an overview of the most commonly used BVH building algorithms. BVH trees can also be further optimized with insertion-based methods or refitted for dynamic scenes.

BBVH - Binned BVH8

👍 : Good quality, improves on Kd trees

: Slow

1. Create $K$$K$ bins and track the number of primitives ${n}_{i}$$n_i$ and the bounding box extents.
2. Assign bin IDs to primitives using a coarse-grained function on centroid locations and add the primitive to the bins.
3. Recursively continue until the number of primitives reaches a threshold or the bounding box becomes too small.
SBVH - Spatial BVH9

👍 : Very high quality

: Slow build, more primitive references

1. Calculate a object-split candidate as in BBVH
2. If the object-split overlap between the two child nodes is large (i.e. $\frac{SA\left({n}_{1}\cap {n}_{2}\right)}{SA\left(N\right)}>\alpha$$\frac{SA(n_1 \cap n_2)}{SA(N)} > \alpha$), also calculate a spatial-split candidate. The spatial split can result in several configurations for clipped primitives and the surface area cost can be assessed for each option.
3. Calculate the SAH cost for each candidate (object and spatial) and proceed with best option.
4. Recurse until complete.
LBVH - Linear BVH10

👍 : Very fast, parallel sort can be parallelized

: Does not optimize for SAH

1. Sort all scene primitves by Morton code.
2. Build BVH tree by splitting primitives when a digit in the Morton code changes (i.e. from 0 to 1).
3. Post-process from bottom-up to group "singleton" nodes.
4. Calculate appropriate bounding box for each node.

Hybrid: A hybrid version of LBVH + SAH can be created by using LBVH for initial splits, then SAH when there is enough splits to efficiently parallelize.

HLBVH - Hierarchical Linear BVH11

👍 : Very fast, avoids singleton nodes

: Lower quality than SAH, may perform worse than LBVH in some scenes

1. Sort all scene primitives by significant bits of Morton code (by compress-sort-decompress method).
2. Sort by remaining bits of Morton code.
3. Use Morton codes to form treelets to avoid creating "singleton" nodes.
4. Calculate appropriate bounding box for each node.

Hybrid: A hybrid version of SAH + HLBVH can be created by using coarse significant bits of Morton code as bins for SAH, then using HLBVH for the remaining bits.

AAC - Approximate Agglomerative Clustering12

👍 : Bottom-up approach, high quality, can be parallelized

: Requires large stack state

1. Sort all scene primitives by Morton code of bounding box center.
2. Build constraint tree from the top down by bisecting primitives at each level until the number of primitives in each subtree is smaller than a threshold $\delta$$\delta$.
3. Combine primitives within each subtree by finding the closest neighbours until there are $n$$n$ clusters (based on relative bounding box size). Operations in each subtree can be executed in parallel.
4. Calculate SAH and determine if subtree should be collapsed back into leaf node.
5. Merge with another subtree and continue combining clusters until there are $n$$n$ clusters.
6. Terminate when the subtree holds a single cluster.
TRBVH - Treelet Restructuring BVH13

👍 : Generates high quality trees, parallel algorithm

: Requires an initial BVH, localized optimizations

1. Build an initial BVH using a fast method such as LBVH.
2. Perform bottom-up traversal. For each node, create a treelet of its descendants and optimize the treelet using SAH. This step can be executed in parallel.
3. Update BVH and continue traversal upwards.
PLOC - Parallel Locally-Ordered Clustering14

👍 : Targets GPUs, high quality trees, fully bottom-up approach

: Less effective on smaller scenes

1. Sort all scene primitives by Morton code of bounding box center.
2. Search for nearest neighbour for each primitive within a set distance $r$$r$ (only search locally).
3. Merge clusters that are mutually the nearest neighbour. There should be at least 1 merge in each iteration.
4. Use parallel prefix scan to identify invalid merged clusters and remove gaps along the Morton curve as clusters are merged.
5. Terminate when all clusters have been merged into a single tree.
6. Each leaf node only holds a single primitive. Refine tree using a bottom-up approach to calculate SAH costs and collapse subtrees.
PRBVH - Parallel Reinsertion BVH15

👍 : Optimizes global cost, parallel algorithm

: Requires an initial BVH

1. Build an initial BVH using a fast method such as LBVH.
2. In parallel, calculate the SAH cost changes of removing a leaf node and re-inserting it elsewhere in the tree. Traverse up, then down, merging and adjusting the bounding volume to find the most optimal location.
3. Terminate when the improvement to the overall SAH cost is less than some $ϵ$$\epsilon$.

Performance

Some works evaluate these BVH build algorithms for their actual ray tracing performance versus their SAH cost. Hippie includes most implementations.

• On Quality Metrics of Bounding Volume Hierarchies6
• Performance Comparison of BVHs for GPU Ray Tracing16

To date, the best ground-up BVH builder is SBVH. However, additional modifications such as PRBVH can improve SBVH further.

Other BVHs

Structures beyond the basic binary BVH can be efficient for specific situations such as dynamic geometry or SIMD traversal.

Multi-Level Hierarchies

To support dynamic scene geometry and accelerate BVH builds for large scenes, modern BVH trees are commonly organized into a two level hierarchy, featuring a top-level acceleration structure (TLAS) and many bottom-level acceleration structures (BLAS).

BLAS structures are regular BVH trees holding the geometry primitives. However, one BLAS object may be instanced many times in the scene. The TLAS structure builds a BVH tree over instances of BLAS objects. The leaf nodes of a TLAS include a transformtion from world-space into the instance-space and a pointer to the appropriate BLAS.

Some APIs offer more than two levels of division, or the option of a programmable instance with traversal shaders17. Other works re-structure multi-level hierarchies through a process called re-braiding18 by building the TLAS over BLAS subtrees in order to improve SAH costs.

Wide BVH

SIMD ray traversals can benefit from wide BVH trees by performing ray intersection tests in parallel at each step. However, there is still poor SIMD efficiency when the entire group must traverse a subtree only few threads hit. There are also more empty slots in wide BVH trees that limit efficiency.

Wide BVH trees can either be collapsed from a binary BVH tree or built on their own.

BVH Layout

Beyond the construction of BVH structures and the associated performance metrics, another key component is the memory layout. The way BVH nodes are stored in memory significantly affects the traversal performance. There are optimizations to promote cache hit rates, to take advantage of locality, and to reduce memory footprint.

A BVH node needs to store at least the bounding volume (represented by a minimum and maximum extent in 3D space) and the location of its children (can be implicit, indexed, addressed, etc.).

Tree Layout

BVH trees are usually mapped to memory in relation to the traversal patterns. The most basic forms are stored in memory in Depth-First Order and Breadth-First Order. Alternatively, proposals including Cache-Oblivious BVH (COLBVH)19 and Swapped Subtrees (SWST)20 organize the tree into subtree clusters in memory. However, the tree layout does not significantly affect performance and speedups vary by application and scene.

Compression

Data size of BVH trees are greatly affected by bounding boxes and child pointers. The bounding boxes can easily be expressed relative to its parent for a reduced storage size. These values are also often quantized in a conservative fashion. In addition, child pointers are often stored as offsets. Several works explore different ways to compress this data:

• RACBVH - Random-Accessible Compressed BVH21
• HMQ - Hierarchical Mesh Quantization22
• CWBVH - Compressed Wide BVH7

Why BVH?

Although BVH trees are the default standard in ray tracing applications today, ray tracing did not always rely on BVHs. Earlier alternative spatial structures to accelerate ray traversal include the following:

Some of the reasons why BVH prevailed are as follows:

• The memory complexity is bounded based on the number of primitives in the scene. Given a BVH structure, the memory usage can be easily computed from the scene primitives. For example, in a binary BVH tree, the BVH contains at most $2n-1$$2n-1$ nodes.
• BVH trees divide space based on the scene contents, allowing for small detailed clusters and empty spaces. For example, placing a detailed teapot in a large, empty stadium can be problematic for many spatial division techniques, but BVH trees allow efficient pruning to avoid testing with empty sections of space.
• Bounding volumes can easily be refitted for dynamic scenes. BVH refitting is a fast alternative to rebuilding the entire structure when there are small geometry changes. The changes are often small due to temporal locality.
• There is a large variety of different BVH build algorithms, creating a range of trade-offs between construction speed and tree quality. Applications can choose the best balance of speed versus quality for their own purposes (i.e. real-time games, CAD renderings, production-quality films, etc.).

A full comparison of BVH versus Kd-tree is evaluated in Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing23

1 D. Meister, S. Ogaki, C. Benthin, M. J. Doyle, M. Guthe, and J. Bittner, “A survey on bounding volume hierarchies for ray tracing,” in Computer Graphics Forum, 2021, vol. 40, pp. 683–712.
2 V. Havran, “Heuristic ray shooting algorithms,” PhD Thesis, Czech Technical University in Prague, 2000.
3 J. Bittner and V. Havran, “RDH: ray distribution heuristics for construction of spatial data structures,” in Proceedings of Spring Conference on Computer Graphics (SCCG), 2009, pp. 51–58.
4 B. Fabianowski, C. Fowler, and J. Dingliana, “A Cost Metric for Scene-Interior Ray Origins.,” in Eurographics (Short Papers), 2009, pp. 49–52.
5 M. Vinkler, V. Havran, and J. Sochor, “Visibility driven BVH build up algorithm for ray tracing,” Computers & Graphics, vol. 36, no. 4, pp. 283–296, 2012.
6 T. Aila, T. Karras, and S. Laine, “On quality metrics of bounding volume hierarchies,” in Proceedings of High Performance Graphics (HPG), 2013, pp. 101–107.
7 H. Ylitie, T. Karras, and S. Laine, “Efficient incoherent ray traversal on GPUs through compressed wide BVHs,” in Proceedings of High Performance Graphics (HPG), 2017, pp. 1–13.
8 I. Wald, “On fast construction of SAH-based bounding volume hierarchies,” in IEEE Symposium on Interactive Ray Tracing, 2007, pp. 33–40.
9 M. Stich, H. Friedrich, and A. Dietrich, “Spatial splits in bounding volume hierarchies,” in Proceedings of High Performance Graphics (HPG), 2009, pp. 7–13.
10 C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, and D. Manocha, “Fast BVH construction on GPUs,” in Computer Graphics Forum, 2009, vol. 28, pp. 375–384.
11 J. Pantaleoni and D. Luebke, “HLBVH: hierarchical LBVH construction for real-time ray tracing of dynamic geometry,” in Proceedings of High Performance Graphics (HPG), 2010, pp. 87–95.
12 Y. Gu, Y. He, K. Fatahalian, and G. Blelloch, “Efficient BVH construction via approximate agglomerative clustering,” in Proceedings of High Performance Graphics (HPG), 2013, pp. 81–88.
13 T. Karras and T. Aila, “Fast parallel construction of high-quality bounding volume hierarchies,” in Proceedings of High Performance Graphics (HPG), 2013, pp. 89–99.
14 D. Meister and J. Bittner, “Parallel locally-ordered clustering for bounding volume hierarchy construction,” IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 24, no. 3, pp. 1345–1353, 2017.
15 D. Meister and J. Bittner, “Parallel reinsertion for bounding volume hierarchy optimization,” in Computer Graphics Forum, 2018, vol. 37, no. 2, pp. 463–473.
16 D. Meister and J. Bittner, “Performance Comparison of Bounding Volume Hierarchies for GPU Ray Tracing,” Journal of Computer Graphics Techniques (JCGT), vol. 11, no. 4.
17 W.-J. Lee, G. Liktor, and K. Vaidyanathan, “Flexible Ray Traversal with an Extended Programming Model,” in International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH Asia), 2019, pp. 17–20.
18 C. Benthin, S. Woop, I. Wald, and A. T. Áfra, “Improved two-level BVHs using partial re-braiding,” in Proceedings of High Performance Graphics (HPG), 2017.
19 S.-E. Yoon and D. Manocha, “Cache-efficient layouts of bounding volume hierarchies,” in Computer Graphics Forum, 2006, vol. 25, no. 3, pp. 507–516.
20 D. Wodniok, A. Schulz, S. Widmer, and M. Goesele, “Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays.,” in EGPGV@ Eurographics, 2013, pp. 57–64.
21 T.-J. Kim, B. Moon, D. Kim, and S.-E. Yoon, “RACBVHs: Random-accessible compressed bounding volume hierarchies,” IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 16, no. 2, pp. 273–286, 2009.
22 B. Segovia and M. Ernst, “Memory efficient ray tracing with hierarchical mesh quantization,” in Proceedings of Graphics Interface, 2010, pp. 153–160.
23 M. Vinkler, V. Havran, and J. Bittner, “Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing,” in Computer Graphics Forum, 2016, vol. 35, no. 8, pp. 68–79.