Understanding Vulkan: A Comprehensive Overview

Slide Note
Embed
Share

In this whirlwind tour of Vulkan, Graham Sellers from AMD provides an in-depth exploration of the Vulkan system architecture, major design goals, application startup process, physical and device information, and logical devices. Vulkan's key features include high performance, scalability, and a solid foundation for future development, making it an excellent choice for graphics and compute applications.


Uploaded on Oct 05, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A Whirlwind Tour of Vulkan Graham Sellers, AMD @grahamsellers

  2. Architecture Your code Khronos IHV / Driver Hardware APPLICATION LOADER DRIVER DRIVER GPU GPU GPU

  3. Overview Overview of the Vulkan System Outline design goals Show example API usage

  4. Goals Major Vulkan design goals High performance from a single thread Scalable to many threads Scalable across wide range of architectures Solid foundation for future development Solve ecosystem issues

  5. Application Startup Vulkan is represented by an instance Application can have multiple Vulkan instances Instance is owned by the loader Aggregates drivers from multiple vendors Responsible for discovery of GPUs Makes multiple drivers look like one big driver supporting many GPUs

  6. Application Startup Application specifies to loader: Information about itself Callback interface for memory allocation VkApplicationInfo appInfo = { ... }; VkAllocCallbacks allocCb = { ... }; VkInstance instance; vkCreateInstance(&appInfo, &allocCb, &instance); Get back a Vullkan instance

  7. Physical Devices Devices are explicitly enumerated in Vulkan uint32_t devCount; VkPhysicalDevice devices[10]; vkEnumeratePhysicalDevices(instance, ARRAYSIZE(devices), &devCount, devices); This produces a list of devices Integrated + discrete Multiple discrete GPUs in one system Application manages multiple devices

  8. Device Information Applications can query information about devices VkPhysicalDeviceFeatures features = {}; vkGetPhysicalDeviceFeatures(phsicalDevice, &features); Returns lots of information about the device Capabilities, optional features, memory sizes, performance characteristics, etc., etc.

  9. Logical Devices Logical device is a software representation of a GPU This is what your application communicates with VkDeviceCreateInfo info = { ... }; VkDevice device; vkCreateDevice(physicalDevice, &info, &device); Parameters include information about application What features it will to use Which queues, extensions, etc.

  10. Queues Work is performed on queues Queues run asynchronously to each other Queues have different capabilities Graphics, compute, DMA operations Property of physical device

  11. Queues Get queue handle from the device VkQueue queue; vkGetDeviceQueue(device, 0, 0, &queue); Queues are represented as members of families Each family has specific capabilities There is one or more queue in each family Family and index are the two parameters above

  12. Command Buffers Commands are sent to a queue in command buffers VkCmdBufferCreateInfo info; VkCmdBuffer cmdBuffer; vkCreateCommandBuffer(device, &info, &cmdBuffer); Creation parameters include: Which queue family it will be submitted to How aggressively drivers should optimize? etc.

  13. Command Buffers Commands are inserted into command buffers VkCmdBufferBeginInfo info = { ... }; vkBeginCommandBuffer(cmdBuf, &info); vkCmdDoThisThing(cmdBuf, ...); vkCmdDoSomeOtherThing(cmdBuf, ...); vkEndCommandBuffer(cmdBuf); Driver heavy lifting happens here State validation, optimization, etc.

  14. Pipelines Pipelines contain most state Compiled up front, used in command buffers VkGraphicsPipelineCreateInfo info = { ... }; VkPipeline pipeline; vkCreateGraphicsPipelines(device, cache, 1, &info, &pipeline); Contains compiled shaders, blend, multisample, etc. Pipelines can be serialized into a cache Improves application load time

  15. Shaders Shaders are compiled up front VkShaderCreateInfo info = { ... }; VkShader shader; vkCreateShader(device, &info, &shader); Primary (only) shading language for Vulkan is SPIR-V Vendor neutral binary intermediate form Same SPIR-V as used in OpenCL 2.1 Reference GLSL -> SPIR-V compiler available

  16. Mutable State A lot of pipeline state is immutable Some state is dynamic Represented by smaller chunks of state VkDynamicViewportStateCreateInfo vpInfo = { ... }; VkDynamicViweportState vpState; vkCreateDynamicViewportState(device, &vpInfo, &vpState); VkDynamicDepthStencilCreateInfo dsInfo = { ... }; VkDynamicDepthStencilState dsState; vkCreateDynamicDepthStencilState(device, &dsInfo, &dsState);

  17. State Binding State is bound to command buffers vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline); vkCmdBindDynamicViewportState(cmdBuffer, vpState); vkCmdBindDynamicDepthStencilState(cmdBuffer, dsState); State is inherited from draw to draw It is not inherited across command buffer boundaries Incremental update by dynamic state binding

  18. Derivative State Pipelines can be derived from other pipelines Create a master pipeline template Modify creation parameters, create derivative Provides performance opportunity During creation, drivers can re-use state At runtime, fast to switch between related states

  19. Vulkan Resources Resources are data that can be accessed by the device Examples are buffers and images Resources represented by API objects VkImageCreateInfo imageInfo = { ... }; VkImage image; vkCreateImage(device, &imageInfo, &image); VkBufferCreateInfo bufferInfo = { ... }; VkBuffer buffer; vkCreateBuffer(device, &bufferInfo, &buffer); Memory for resources is managed by the application

  20. Device Memory Applications query objects for their memory needs: VkMemoryRequirements reqs; vkGetImageMemoryRequirements(device, image, &reqs); Application allocates memory for objects: VkMemoryAllocInfo memInfo = { ... }; VkDeviceMemory mem; vkAllocMemory(device, &memInfo, &mem); Application binds memory to the resource: vkBindImageMemory(device, image, mem, 0);

  21. Managing Memory Application managed memory: Application does pool management Multiple resource in a single allocation Avoid overhead of allocation per object Recycle memory between objects

  22. Sharing Data Unlike OpenGL, memory is mapped, not buffers Bind memory to buffer Map memory for CPU access vkMapMemory(device, mem, offset, size, flags, &pData); Flags control how memory is allocated and mapped Control over caching, coherency, etc. provided Zero-copy and UMA fully supported

  23. Descriptors Vulkan resources are represented by descriptors Descriptors are arranged in sets Sets are allocated from pools Sets have layouts, known at pipeline creation time vkCreateDescriptorPool(...); vkCreateDescriptorSetLayout(...); vkAllocDescriptorSets(...);

  24. Pipeline Layouts Layouts represent arrangement of sets used by pipelines Layout is shared between sets and pipelines Layout represented by VkPipelineLayout object Used at pipeline create time Switch pipelines using sets of the same layout Pipelines are considered compatible vkCreatePipelineLayout(...);

  25. Render Passes Frames logically organized into render passes VkRenderPassCreateInfo info = { ... }; VkRenderPass renderPass; vkCreateRenderPass(device, &info, &renderPass); Render pass contains a lot of information: Layout and types of framebuffer attachments What to do when the render pass begins and ends Part of the framebuffer that the pass may effect

  26. Merging Passes Vulkan has the concept of a sub-pass Allows multiple render passes to be merged Intermediate attachments for transient data Data passed from pass to pass Tile-based architectures can keep data on chip Might reuse memory for temporary surfaces

  27. Drawing Draws are always inside a render pass VkRenderPassBegin beginInfo = { renderPass, ... }; vkCmdBeginRenderPass(cmdBuffer, &beginInfo); vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline); vkCmdBindDescriptorSets(cmdBuffer, ...); vkCmdDraw(cmdBuffer, 0, 100, 1, 0); vkCmdEndRenderPass(cmdBuffer, renderPass); All draw types supported instancing, indirect, etc.

  28. Compute Compute pipelines are special Possible to have (multiple) compute-only queues Queues run asynchronously Yes, asynchronous compute VkComputePipelineCreateInfo info = { ... }; VkPipeline pipeline; vkCreateComputePipeline(device, cache, 1, &info, &pipeline); Compute launched through dispatches

  29. Synchronization Work is synchronized through event primitives VkEventCreateInfo info = { ... }; VkEvent event; vkCreateEvent(device, &info, &event); Events may be set, reset, polled and waited on vkSetEvent(...); vkResetEvent(...); vkGetEventStatus(...); vkCmdSetEvent(...); vkCmdResetEvent(...); vkCmdWaitEvents(...);

  30. Resource State Resources can be in any of many states Renderable, CPU read, shader read or write, etc. Drivers used to track this information Not any more! Now it s your job VkImageMemoryBarrier imageBarrier = { ... }; vkCmdPipelineBarrier(cmdBuffer, ..., 1, &imageBarrier); Pass old state + stages, new state + stages Driver will take care of the rest

  31. Work Submission Work is submitted to queues for execution VmCmdBuffer commandBuffers[] = { cmdBuffer1, cmdBuffer2, ...}; vkQueueSubmit(queue, 1, commandBuffers, fence); A fence (VkFence) is associated with the submission This is signaled when work completes CPU can wait on this fence Queues marshal resources ownership with semaphores vkQueueSignalSemaphore(queue, semaphore); vkQueueWaitSemaphore(queue, semaphore);

  32. Threading Threading is a big consideration API doesn t lock that s the application responsibility Concurrent read access to same object Concurrent write access to different objects Performance from one thread will still be good

  33. Presentation Displaying outputs is optional! We expect some compute-only Vulkan applications No real need to create a window console mode Each platform is different Presentation is an extension We define two flavors of the Window System Interface One is for compositors, one is for direct-display

  34. Displays Vulkan also abstracts some display management Also delegated to WSI extensions Manage display mode Turn vsync on and off Enumerate and take control of displays This all depends on platform support, of course!

  35. Teardown Application responsible for object destruction Must be correctly ordered No reference counting No implicit object lifetime Do not delete objects that are still in use! This includes use by GPU

  36. Scalability Scalability is an important goal Scales from low power mobile to high end workstation Many features optional Queryable upper limits for most things Still considering how to bundle features Want to avoid sea of caps problem May defer to platform owners

  37. Extensibility Vulkan has a first class extension mechanism Extensions are opt-in No more using extensions by accident Don t pay driver tax for unused features Much easier to validate Still want to expose bleeding edge Vulkan is a platform for innovation

  38. Tools and Debugging Tools and development are key to success Strong tools mean better applications Vulkan is not simple tools are a must Khronos is looking to build a strong ecosystem Tools, loader and other components open source Well documented hooks for extending API

  39. Tools and Debugging APPLICATION LOADER TOOLS LAYERS DRIVER DRIVER GPU GPU GPU

  40. Layers Loader supports layering APIs Formal hooks for debuggers and tools No more interceptors, shims, or stub libraries Validation in intermediate layers Opt-in, very powerful Several layers already developed API trace, parameter validation, API timing, etc.

  41. Layers Multiple types of layer Instance level layers Enabled at instance creation time Globally available to every device in instance Device level layers Specific to device Enable device-specific extensions, for example

  42. Summary Not really low-level , just a better abstraction Very low overhead: Low overhead means more application CPU cycles Explicit threading support means you can go wide without worrying about graphics APIs Building command buffers once and submitting many times means low amortized cost

  43. Summary Cross-platform, cross-vendor Not tied to single OS (or OS version) Not tied to single GPU family or vendor Not tied to single architecture Desktop + mobile, forward and deferred, tilers all first class citizens

  44. Summary Open, extensible Khronos is an open standards body Collaboration from across the industry, IHVs + ISVs, games, CAD, Pro Graphics, AAA + casual Full support for extensions, layering, debuggers, tools SPIR-V fully documented write your own compiler!

  45. Thanks! @grahamsellers www.khronos.org/vulkan

Related


More Related Content