GPU Accelerated Algorithm for 3D Delaunay Triangulation

undefined
THANH-TUNG CAO, TODD
MINGCEN GAO
TIOW-SENG TAN
National University of
Singapore
A GPU Accelerated Algorithm
for 3D Delaunay Triangulation
ASHWIN NANJAPPA
Bioinformatics Institute
Singapore
Outline
9/17/2024
i3D '14
2
Background
Related work
Algorithm
Implementation
Result
 Background
 
Background
9/17/2024
i3D '14
3
Delaunay Triangulation
Empty ball property
 
Outline 
 Applications
2D
3D
Background
9/17/2024
i3D '14
4
Applications:
Graphics
CAD
Visualization
Scientific computation
 Related work
 
 Sequential algorithms
Related work
9/17/2024
i3D '14
5
 
Sequential algorithms:
Incremental construction
Divide-and-conquer
Incremental insertion
Points are inserted one by one
Triangulation is locally fixed after each insertion
 
Bowyer-Watson’s algorithm [1981]
Flipping algorithm [Joe 1991]
 
Background 
 Parallel algorithms
Related work
9/17/2024
i3D '14
6
Parallel and multi-core algorithms:
Incremental construction 
 high complexity
Domain partitioning 
 GPU needs thousands of partitions
Incremental insertion [Batista et al. 2010]
Several points are inserted at a time
Each insertion modifies a small region
Conflict 
 Rollback
 
 
 GPU algorithms
Related work
9/17/2024
i3D '14
7
GPU algorithms:
Experiment with Batista et al.’s approach
1 million points
At most 2000-3000 points can be inserted in each round
Digital Voronoi diagram approach [Qi et al. 2012]
Dualization not working in 3D
 Algorithm
 
 Our algorithm
Algorithm
9/17/2024
i3D '14
8
 
Overall approach:
Insert points in batches, each batch at most one point is
inserted into a tetrahedron
Use flipping to get close to the DT
Use star-splaying in CPU to fix [Shewchuk 2005]
 
Problem: Flipping easily gets stuck!
100K points, 
 6,800 bad facets
(non-Delaunay and unflippable).
Worse for real-world data.
 
 
Related work 
 Insert - Flip
Algorithm
9/17/2024
i3D '14
9
Observation:
Do flipping after each round of point insertion
 Much better result.
 
 
 Star splaying
 
100K points, 96 bad facets (vs.
6,800)
Bunny model (36K points), 92
bad facets
 
Now correction on CPU is
acceptable.
 
 
Algorithm
9/17/2024
i3D '14
10
CPU correction: star splaying algorithm
Lifting map: w = x
2
 + y
2
 + z
2
Each vertex: construct a convex star.
Consistency: If the star of 
s 
contains tetrahedron 
stuv
, then the
star of 
t,
 
u, 
and
 v
 must also contain that tetrahedron.
 
 
 Difficulties
2D illustration
Algorithm
9/17/2024
i3D '14
11
CPU correction: star splaying algorithm
Difficulties: Time consuming
Construct convex stars (incremental insertion)
Check all the stars for inconsistencies
Convert stars back to mesh
representation
 
 
 Adaptive star splaying
Algorithm
9/17/2024
i3D '14
12
Adaptive star splaying
Only some small regions around
the bad facets are processed
Construct stars incident to the bad
facets.
Almost convex 
 use Flip-Flop
[Gao et al. 2013]
Need another star 
 derive from
the triangulation.
Almost output sensitive.
 
 
 Pseudo-code
Non-Delaunay &
unflippable
Non-convex
stars
Affected region
2D illustration
GPU Implementation
9/17/2024
i3D '14
13
Initialize;
While 
there are points not inserted
Pick one point per tetrahedron and insert;
While
 there are modified tetrahedra
Collect the modified tetrahedra;
Process and identify possible flips;
Perform the flips;
Update location of remaining points;
 
Algorithm 
 Divergence
GPU Implementation
9/17/2024
i3D '14
14
Thread divergence:
Compact the list of modified tetrahedra before processing
Exact predicates [Shewchuk 1996]
Use only fast check in 1
st
 kernel
Collect those that require exact computation
Do the exact computation in 2
nd
 kernel
Point location:
Store the flips, build flip history DAG
Trace the DAG to locate
 
 
 Memory
GPU Implementation
9/17/2024
i3D '14
15
Memory access:
Rearrange the data to improve the GPU cache efficiency.
Sort the input points by the Z-curve
Sort the tetrahedra list by the minimum vertex indices.
 Result
 
 Experiment setting
Result
9/17/2024
i3D '14
16
Experiment settings:
CPU: Intel I7 2600K 3.4Ghz, 16GB RAM
GPU: NVIDIA GTX 580, 3GB VRAM
CUDA 5.0, VS 2012.
CGAL 4.2
 
Implementation 
 3D Speedup
Result
9/17/2024
i3D '14
17
3D Speedup:
Synthetic data: Uniform, Gaussian, sphere, grid.
Up to 1.5 million points
8-10 times faster than CGAL
Real models: Armadillo, Angel, Dragon, Happy Buddha…
6-9 times speedup over CGAL
 
 
 2D Speedup
Result
9/17/2024
i3D '14
18
Also implement for 2D DT:
Synthetic data:
10 times over 
Triangle
, 7 times over CGAL
2 times faster than [Qi et al. 2012]
 
 
 Time breakdown
Result
9/17/2024
i3D '14
19
Time breakdown
 
 
 Insert-Flip vs. InsAll-Flip
Result
9/17/2024
i3D '14
20
Insert-Flip vs. InsertAll-Flip
100 times less bad facets
40% less flips
 Conclusion
 
 Conclusion
Conclusion
9/17/2024
i3D '14
21
New algorithm for DT construction on GPU
Both 2D and 3D (possibly higher)
Uniform and non-uniform point set
Exact computation, robust.
Limitation:
Needs to copy the result to CPU for splaying
Sequential flipping on some pathological cases
Memory bound implementation
 
Result 
 END
End
9/17/2024
i3D '14
22
Thank you!
GPU Implementation
9/17/2024
i3D '14
23
Point location:
Store the flips in all the iterations
Construct the history DAG of flips
Update the point location at the end
 
 
 Memory access
Result
9/17/2024
i3D '14
24
Stars involved in the CPU star splaying
Significantly more for non-uniform point sets
Still reasonably small
 
 
 Insert all - Flip
Slide Note
Embed
Share

Thanh-Tung Cao, Todd Mingcen Gao, Tiow-Seng Tan, and Ashwin Nanjappa from the National University of Singapore's Bioinformatics Institute present a GPU-accelerated algorithm for 3D Delaunay triangulation. Their work explores the background, related works, algorithm implementation, and results of this innovative approach. The algorithm's application in graphics, CAD, visualization, and scientific computation is highlighted, emphasizing both sequential and parallel algorithms. The authors discuss the challenges faced, such as flipping and bad facets, and propose solutions to enhance the Delaunay triangulation process, showcasing their outcomes at the i3D '14 conference.

  • GPU Accelerated
  • Algorithm
  • 3D Delaunay Triangulation
  • Parallel Algorithms
  • Bioinformatics

Uploaded on Sep 17, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A GPU Accelerated Algorithm for 3D Delaunay Triangulation THANH-TUNG CAO, TODD MINGCEN GAO TIOW-SENG TAN ASHWIN NANJAPPA National University of Singapore Bioinformatics Institute Singapore

  2. Outline Background 2 Background Related work Algorithm Implementation Result 9/17/2024 i3D '14

  3. Background Outline 3 Delaunay Triangulation Empty ball property 3D 2D Applications 9/17/2024 i3D '14

  4. Background Related work 4 Applications: Graphics CAD Visualization Scientific computation Sequential algorithms 9/17/2024 i3D '14

  5. Related work Background 5 Sequential algorithms: Incremental construction Divide-and-conquer Incremental insertion Points are inserted one by one Triangulation is locally fixed after each insertion Bowyer-Watson s algorithm [1981] Flipping algorithm [Joe 1991] Parallel algorithms 9/17/2024 i3D '14

  6. Related work 6 Parallel and multi-core algorithms: Incremental construction high complexity Domain partitioning GPU needs thousands of partitions Incremental insertion [Batista et al. 2010] Several points are inserted at a time Each insertion modifies a small region Conflict Rollback GPU algorithms 9/17/2024 i3D '14

  7. Related work Algorithm 7 GPU algorithms: Experiment with Batista et al. s approach 1 million points At most 2000-3000 points can be inserted in each round Digital Voronoi diagram approach [Qi et al. 2012] Dualization not working in 3D Our algorithm 9/17/2024 i3D '14

  8. Algorithm Related work 8 Overall approach: Insert points in batches, each batch at most one point is inserted into a tetrahedron Use flipping to get close to the DT Use star-splaying in CPU to fix [Shewchuk 2005] Problem: Flipping easily gets stuck! 100K points, 6,800 bad facets (non-Delaunay and unflippable). Worse for real-world data. Insert - Flip 9/17/2024 i3D '14

  9. Algorithm 9 Observation: Do flipping after each round of point insertion Much better result. 100K points, 96 bad facets (vs. 6,800) Bunny model (36K points), 92 bad facets Now correction on CPU is acceptable. Star splaying 9/17/2024 i3D '14

  10. Algorithm 10 CPU correction: star splaying algorithm Lifting map: w = x2 + y2 + z2 Each vertex: construct a convex star. Consistency: If the star of s contains tetrahedron stuv, then the star of t,u, and v must also contain that tetrahedron. u u t s s t r v 2D illustration Difficulties 9/17/2024 i3D '14

  11. Algorithm 11 CPU correction: star splaying algorithm Difficulties: Time consuming Construct convex stars (incremental insertion) Check all the stars for inconsistencies Convert stars back to mesh representation Adaptive star splaying 9/17/2024 i3D '14

  12. Algorithm 12 Adaptive star splaying Only some small regions around the bad facets are processed Construct stars incident to the bad facets. Almost convex use Flip-Flop [Gao et al. 2013] Need another star derive from the triangulation. Non-Delaunay & unflippable Non-convex stars Almost output sensitive. Affected region 2D illustration Pseudo-code 9/17/2024 i3D '14

  13. GPU Implementation Algorithm 13 Initialize; While there are points not inserted Pick one point per tetrahedron and insert; While there are modified tetrahedra Collect the modified tetrahedra; Process and identify possible flips; Perform the flips; Update location of remaining points; Divergence 9/17/2024 i3D '14

  14. GPU Implementation 14 Thread divergence: Compact the list of modified tetrahedra before processing Exact predicates [Shewchuk 1996] Use only fast check in 1st kernel Collect those that require exact computation Do the exact computation in 2nd kernel Point location: Store the flips, build flip history DAG Trace the DAG to locate Memory 9/17/2024 i3D '14

  15. GPU Implementation Result 15 Memory access: Rearrange the data to improve the GPU cache efficiency. Sort the input points by the Z-curve Sort the tetrahedra list by the minimum vertex indices. Experiment setting 9/17/2024 i3D '14

  16. Result Implementation 16 Experiment settings: CPU: Intel I7 2600K 3.4Ghz, 16GB RAM GPU: NVIDIA GTX 580, 3GB VRAM CUDA 5.0, VS 2012. CGAL 4.2 3D Speedup 9/17/2024 i3D '14

  17. Result 17 3D Speedup: Synthetic data: Uniform, Gaussian, sphere, grid. Up to 1.5 million points 8-10 times faster than CGAL Real models: Armadillo, Angel, Dragon, Happy Buddha 6-9 times speedup over CGAL 2D Speedup 9/17/2024 i3D '14

  18. Result 18 Also implement for 2D DT: Synthetic data: 10 times over Triangle, 7 times over CGAL 2 times faster than [Qi et al. 2012] Time breakdown 9/17/2024 i3D '14

  19. Result 19 Time breakdown Insert-Flip vs. InsAll-Flip 9/17/2024 i3D '14

  20. Result Conclusion 20 Insert-Flip vs. InsertAll-Flip 100 times less bad facets 40% less flips Conclusion 9/17/2024 i3D '14

  21. Conclusion Result 21 New algorithm for DT construction on GPU Both 2D and 3D (possibly higher) Uniform and non-uniform point set Exact computation, robust. Limitation: Needs to copy the result to CPU for splaying Sequential flipping on some pathological cases Memory bound implementation END 9/17/2024 i3D '14

  22. End 22 Thank you! 9/17/2024 i3D '14

  23. GPU Implementation 23 Point location: Store the flips in all the iterations Construct the history DAG of flips Update the point location at the end Memory access 9/17/2024 i3D '14

  24. Result 24 Stars involved in the CPU star splaying Significantly more for non-uniform point sets Still reasonably small Insert all - Flip 9/17/2024 i3D '14

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#