Next-Generation Graphics APIs: Vulkan, D3D12, and Metal

Next-Generation Graphics APIs:
Similarities and Differences
Tim Foley
NVIDIA Corporation
 
Next-Generation Graphics APIs
 
Vulkan, D3D12, and Metal
Coming to platforms you care about
 
Why do we want new APIs?
How are they different?
 
Why new Graphics APIs?
 
Reduce CPU overhead/bottlenecks
 
More stable/predictable driver performance
 
Explicit, console-like control
 
CPU Bottlenecks
 
Only single app thread creating GPU work
Can become bottleneck for complex scenes
Try to do as little on this thread as possible
Multi-threaded driver helps a bit
 
New APIs: multi-threaded work creation
 
Driver Overhead, Predictability
 
App submits a draw call, maps a buffer, etc.
Driver might
Compile shaders
Insert fences into GPU schedule
Flush caches
Allocate memory
 
Explicit, Console-Like Control
 
Explicit synchronization
CPU/GPU sharing, RMW hazards, etc.
 
Explicit memory management
Allocate large memory region at load time
Handle sub-allocation in application code
 
This Talk
 
Bootstrap your mental model
 
Introduce concepts shared across APIs
 
Point out major differences
Try to hand-wave the small ones
 
Big Topics
 
Command buffers
Pipeline state objects
Tiling
Resources / Binding
Hazards / Lifetime
 
Metal
 
Vulkan
 
D3D12
 
Command Buffers
 
 
Metal
 
Vulkan
 
D3D12
Single-Threaded Submission
CPU Thread
GPU Front-End
cmd
cmd
cmd
driver
Writing to a Command Buffer
CPU Thread
GPU Front-End
 
cmd
cmd
cmd
driver
 
 
 
Submitting a Command Buffer
CPU Thread
GPU Front-End
 
cmd
cmd
cmd
cmd
cmd
driver
Queue
 
Submitting a Command Buffer
CPU Thread
GPU Front-End
driver
cmd
cmd
cmd
cmd
cmd
 
Queue
 
Start Writing to a New Buffer
CPU Thread
GPU Front-End
 
driver
cmd
cmd
cmd
cmd
cmd
 
Queue
 
CPU-GPU Asynchrony
CPU Thread
GPU Front-End
 
cmd
driver
cmd
cmd
cmd
cmd
 
Queue
 
CPU-GPU Asynchrony
CPU Thread
GPU Front-End
 
cmd
driver
cmd
cmd
 
cmd
 
Command Buffers and Queues
 
Recording and Submitting
 
Record commands into command buffer
Record many buffers at once, across threads
 
Submit command buffer to a queue
GPU consumes in order submitted
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
cmd
cmd
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
cmd
cmd
cmd
cmd
cmd
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
cmd
cmd
cmd
cmd
cmd
done!
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
cmd
cmd
cmd
cmd
cmd
Queue
 
Multi-Threaded Submission
GPU Front-End
 
CPU Thread
cmd
cmd
CPU Thread
cmd
cmd
CPU Thread
CPU Thread
cmd
cmd
cmd
cmd
cmd
cmd
 
“Free Threading”
 
Call API functions from any thread
Not required to have one “render thread”
 
Application responsible for synchronization
Calls that read/write same API object(s)
Often, object is owned by one thread at a time
 
Similarities
 
Free-threaded record + submit
 
Command buffer contents are opaque
Can’t ship pre-built buffers like on console
 
No state inheritance across buffers
 
Differences
 
Metal command buffers are one-shot
Vulkan, D3D12 allow more re-use
Re-submit same buffer across frames
Invoke one command buffer from another
Limited form of command-buffer call/return
“Second-level” command buffer / “bundle”
 
Pipeline State Objects
 
 
Metal
 
Vulkan
 
D3D12
 
State-Change Granularity
 
GL 1.0: “the OpenGL State Machine”
 
 
D3D10: aggregate state objects
 
glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
 
d3dDevice->CreateBlendState(&blendDesc, &blendStateObj);
...
d3dContext->OMSetBlendState(blendStateObj);
 
Pipeline State Object (PSO)
 
Encapsulates most of GPU state vector
Application switches between full PSOs
 
Compile and validate early
Avoid driver pauses when changing state
May not play nice with some engine designs
 
What goes in a PSO?
 
Shader for each active stage
Much fixed-function state
Blend, rasterizer, depth/stencil
Format information
Vertex attributes
Color, depth targets
 
What doesn’t go in a PSO
 
Resource bindings
Vertex/index/constant buffers, textures, ...
 
Some pieces of fixed-function state
A bit different for each API
 
Setting Non-PSO State
 
Set directly on command buffer
 
 
Use smaller state objects (Metal/Vulkan)
 
d3dCommandBuffer->OMSetStencilRef(0xFFFFFFFF);
 
mtlCommandBuffer.setTriangleFillMode(.Lines);
 
mtlCommandBuffer.setDepthStencilState(mtlDepthStencilState);
 
vkCreateDynamicViewportState(device, &vpInfo, &vpState);
 
Tiled Architectures and “Passes”
 
Pass
Sequence of draw calls
Sharing same target(s)
Explicit in Metal/Vulkan
Simplifies/enables optimizations
Jesse’s talk will go in depth
 
Metal
 
Vulkan
 
D3D12
 
Memory and Resources
 
 
Metal
 
Vulkan
 
D3D12
 
Concepts
 
Allocation: range of virtual addresses
 
Resource: memory + layout
 
View: resource + format/usage
 
Concepts
 
Allocation: range of virtual addresses
Caching, visibility, …
Resource: memory + layout
Buffer, Texture3D, Texture2DMSArray, …
View: resource + format/usage
Depth-stencil view, …
 
Memory and Resources
 
Resource Binding
 
 
Metal
 
Vulkan
 
D3D12
 
GPU State Vector
Pipeline State Object
 
Textures
 
Buffers
 
Samplers
 
Binding Tables
 
Descriptor
 
GPU-specific encoding of a resource view
Size and format opaque to applications
Multiple types, based on usage
Texture, constant buffer, sampler, etc.
 
Just a block of data; not an allocation
 
Descriptor Table
 
An API object that holds multiple descriptors
Kind of like a buffer, but contents are opaque
 
 
Table may hold multiple types of descriptors
D3D12, Vulkan have different rules on this
GPU State Vector
Pipeline State Object
Textures
Buffers
 
Descriptor Tables
Samplers
 
Pipeline Layout
 
Shaders impose constraints on table layout
“Descriptor 2 in table 0 had better be a texture”
Pipeline layout is an explicit API object
Interface between PSO and descriptor tables
 
Multiple shaders/PSOs can use same layout
GPU State Vector
Pipeline State Object
Textures
Buffers
Descriptor Tables
Samplers
Pipeline Layout
 
Root Table
 
GPU State Vector
Pipeline State Object
 
Textures
 
Buffers
 
Descriptor Tables
 
Samplers
Pipeline Layout
 
Root Table
 
Descriptor Tables and Layouts
 
Data Hazards
and Object Lifetimes
 
 
Metal
 
Vulkan
 
D3D12
 
Old APIs: Driver Does it For You
 
Map a buffer that is in use?
Driver will wait, or allocate a fresh “version”
Render to image, then use as texture?
Driver notices the change, makes it work
Allocate more texture than fit in GPU mem?
Driver will page stuff in/out to make room
 
New APIs: You Do It Yourself
 
Explicitly synchronize CPU/GPU
Explicitly manage object lifetimes
Explicitly manage residency (D3D12)
Explicitly signal resource transitions
Done drawing to target, about to use as texture
 
Explicitly Synchronize CPU/GPU
 
No automatic “versioning” of resources
No “map discard” or equivalent
Don’t write to something GPU might be
reading
Use explicit events to synchronize
 
Explicitly Manage Object Lifetimes
 
Don’t delete something GPU is using
Same basic problem as not writing to it
Use explicit events to synchronize
 
Sounds like a lot of busy-work, right?
Not actually that bad in practice
Other speakers will share strategies
 
Explicitly Signal Resource Transitions
 
Done rendering image, use as texture
Driver may need to do “stuff”
Insert execution barriers
Flush caches
Decompress data
 
Resource Transitions
 
Conceptually every resource is in one “state”
Texture, color target, depth target, …
Switching state is an explicit command
Well-defined time for driver to insert “stuff”
 
Use resource when in wrong state: error
 
Summary
 
 
It is all about trade-offs
 
You get more control, predictability
More like programming for a console
 
In return, you get more responsibility
App must do what driver used to
More like programming for a console…
 
Are the trade-offs worth it?
 
You’ll need to decide for yourself
 
Other speakers will share their experience
Benefits they’ve seen from these APIs
Strategies to make working with them easier
Slide Note
Embed
Share

Delve into the world of next-generation graphics APIs like Vulkan, D3D12, and Metal, understanding their importance, differences, and benefits. Discover how these APIs aim to reduce CPU bottlenecks, enhance driver performance, and provide explicit, console-like control for improved graphics rendering.

  • Graphics APIs
  • Vulkan
  • D3D12
  • Metal
  • CPU bottleneck

Uploaded on Sep 08, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation

  2. Next-Generation Graphics APIs Vulkan, D3D12, and Metal Coming to platforms you care about Why do we want new APIs? How are they different?

  3. Why new Graphics APIs? Reduce CPU overhead/bottlenecks More stable/predictable driver performance Explicit, console-like control

  4. CPU Bottlenecks Only single app thread creating GPU work Can become bottleneck for complex scenes Try to do as little on this thread as possible Multi-threaded driver helps a bit New APIs: multi-threaded work creation

  5. Driver Overhead, Predictability App submits a draw call, maps a buffer, etc. Driver might Compile shaders Insert fences into GPU schedule Flush caches Allocate memory

  6. Explicit, Console-Like Control Explicit synchronization CPU/GPU sharing, RMW hazards, etc. Explicit memory management Allocate large memory region at load time Handle sub-allocation in application code

  7. This Talk Bootstrap your mental model Introduce concepts shared across APIs Point out major differences Try to hand-wave the small ones

  8. Big Topics Command buffers Pipeline state objects Tiling Resources / Binding Hazards / Lifetime D3D12 Metal Vulkan

  9. Command Buffers Command Buffers D3D12 Metal Vulkan

  10. Single-Threaded Submission CPU Thread cmd cmd driver cmd GPU Front-End

  11. Writing to a Command Buffer CPU Thread cmd cmd cmd driver GPU Front-End

  12. Submitting a Command Buffer CPU Thread cmd cmd cmd cmd cmd driver GPU Front-End

  13. Submitting a Command Buffer CPU Thread driver Queue cmd cmd cmd cmd cmd GPU Front-End

  14. Start Writing to a New Buffer CPU Thread driver Queue cmd cmd cmd cmd cmd GPU Front-End

  15. CPU-GPU Asynchrony CPU Thread cmd driver Queue cmd cmd cmd cmd GPU Front-End

  16. CPU-GPU Asynchrony CPU Thread cmd cmd driver Queue cmd cmd GPU Front-End

  17. Command Buffers and Queues D3D12 Metal Vulkan ID3D12CommandList MTLCommandBuffer VkCmdBuffer ID3D12CommandQueue MTLCommandQueue VkCmdQueue

  18. Recording and Submitting Record commands into command buffer Record many buffers at once, across threads Submit command buffer to a queue GPU consumes in order submitted

  19. Multi-Threaded Submission CPU Thread cmd CPU Thread cmd cmd CPU Thread cmd cmd CPU Thread cmd Queue GPU Front-End

  20. Multi-Threaded Submission CPU Thread cmd cmd CPU Thread cmd cmd cmd CPU Thread cmd cmd cmd CPU Thread cmd cmd Queue GPU Front-End

  21. Multi-Threaded Submission CPU Thread cmd cmd cmd CPU Thread cmd cmd cmd cmd CPU Thread cmd cmd cmd cmd CPU Thread cmd cmd cmd Queue GPU Front-End

  22. Multi-Threaded Submission CPU Thread cmd cmd cmd CPU Thread cmd cmd cmd cmd CPU Thread done! cmd cmd cmd cmd cmd CPU Thread cmd cmd cmd Queue GPU Front-End

  23. Multi-Threaded Submission CPU Thread cmd cmd cmd CPU Thread cmd cmd cmd cmd CPU Thread CPU Thread cmd cmd cmd Queue GPU Front-End cmd cmd cmd cmd cmd

  24. Multi-Threaded Submission CPU Thread cmd cmd cmd CPU Thread cmd cmd cmd cmd CPU Thread CPU Thread cmd cmd cmd Queue GPU Front-End cmd cmd cmd cmd cmd

  25. Free Threading Call API functions from any thread Not required to have one render thread Application responsible for synchronization Calls that read/write same API object(s) Often, object is owned by one thread at a time

  26. Similarities Free-threaded record + submit Command buffer contents are opaque Can t ship pre-built buffers like on console No state inheritance across buffers

  27. Differences Metal command buffers are one-shot Vulkan, D3D12 allow more re-use Re-submit same buffer across frames Invoke one command buffer from another Limited form of command-buffer call/return Second-level command buffer / bundle

  28. Pipeline State Objects Pipeline State Objects D3D12 Metal Vulkan

  29. State-Change Granularity GL 1.0: the OpenGL State Machine glEnable(GL_BLEND); glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA); D3D10: aggregate state objects d3dDevice->CreateBlendState(&blendDesc, &blendStateObj); ... d3dContext->OMSetBlendState(blendStateObj);

  30. Pipeline State Object (PSO) Encapsulates most of GPU state vector Application switches between full PSOs Compile and validate early Avoid driver pauses when changing state May not play nice with some engine designs

  31. What goes in a PSO? Shader for each active stage Much fixed-function state Blend, rasterizer, depth/stencil Format information Vertex attributes Color, depth targets

  32. What doesnt go in a PSO Resource bindings Vertex/index/constant buffers, textures, ... Some pieces of fixed-function state A bit different for each API

  33. Setting Non-PSO State Set directly on command buffer d3dCommandBuffer->OMSetStencilRef(0xFFFFFFFF); mtlCommandBuffer.setTriangleFillMode(.Lines); Use smaller state objects (Metal/Vulkan) mtlCommandBuffer.setDepthStencilState(mtlDepthStencilState); vkCreateDynamicViewportState(device, &vpInfo, &vpState);

  34. Tiled Architectures and Passes Pass Sequence of draw calls Sharing same target(s) Explicit in Metal/Vulkan Simplifies/enables optimizations Jesse s talk will go in depth D3D12 Metal Vulkan

  35. Memory and Resources Memory and Resources D3D12 Metal Vulkan

  36. Concepts Allocation: range of virtual addresses Resource: memory + layout View: resource + format/usage

  37. Concepts Allocation: range of virtual addresses Caching, visibility, Resource: memory + layout Buffer, Texture3D, Texture2DMSArray, View: resource + format/usage Depth-stencil view,

  38. Memory and Resources D3D12 Vulkan Allocation ID3D12Heap Resource ID3D12Resource VkDeviceMemory VkImage VkBuffer View ID3D12DepthStencilView ID3D12RenderTargetView VkImageView VkBufferView

  39. Resource Binding Resource Binding D3D12 Metal Vulkan

  40. Samplers Textures Buffers Binding Tables GPU State Vector Pipeline State Object

  41. Descriptor GPU-specific encoding of a resource view Size and format opaque to applications Multiple types, based on usage Texture, constant buffer, sampler, etc. Just a block of data; not an allocation

  42. Descriptor Table An API object that holds multiple descriptors Kind of like a buffer, but contents are opaque Table may hold multiple types of descriptors D3D12, Vulkan have different rules on this

  43. Samplers Textures Buffers Descriptor Tables GPU State Vector Pipeline State Object

  44. Pipeline Layout Shaders impose constraints on table layout Descriptor 2 in table 0 had better be a texture Pipeline layout is an explicit API object Interface between PSO and descriptor tables Multiple shaders/PSOs can use same layout

  45. Samplers Textures Buffers Descriptor Tables Root Table Pipeline Layout GPU State Vector Pipeline State Object

  46. Samplers Textures Buffers Descriptor Tables Root Table Pipeline Layout GPU State Vector Pipeline State Object

  47. Descriptor Tables and Layouts D3D12 Vulkan ID3D12DescriptorHeap - VkDescriptorPool VkDescriptorSet D3D12_ROOT_DESCRIPTOR_TABLE VkDescriptorSetLayout ID3D12RootLayout VkPipelineLayout

  48. Data Hazards Data Hazards and Object Lifetimes and Object Lifetimes D3D12 Metal Vulkan

  49. Old APIs: Driver Does it For You Map a buffer that is in use? Driver will wait, or allocate a fresh version Render to image, then use as texture? Driver notices the change, makes it work Allocate more texture than fit in GPU mem? Driver will page stuff in/out to make room

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#