Understanding Rendering Pipeline and Rasterisation Techniques in Graphics APIs

Slide Note
Embed
Share

Explore the rendering pipeline in OpenGL and DirectX, learn optimization techniques for rasterization, compare common modules in the pipeline, and delve into fundamental optimization methods post-vertex processing. Dive into literature resources and system models, and understand the OpenGL rendering pipeline from top-down approach, covering vertex pre-processing, processing, and more.


Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. RENDERING PIPELINE AND RASTERISATION TECHNIQUES

  2. AIMS AND OBJECTIVES Research and understand rendering pipeline for graphics APIs and optimisation techniques for rasterisation. OpenGL 4.0+ DirectX 11 Requirements Obtain an overview of common modules in the rendering pipeline. Comparison between graphics API stacks: OpenGL and DirectX Fundamental optimisation techniques after vertex post-processing

  3. LITERATURE REVIEW Mandatory sources Web links OpenGL Rendering Pipeline [wiki] DirectX 11 Programming Guide [online resource] Presentations Introduction to Computer Graphics, Lecture 5, Rasterisation , J. Schulze Ph.D, University of California, 2012 Additional sources Academic sources Research from top academic institutions with journals in ACM TOG and IJCAET Stanford, Berkley, Toronto, MIT, ETH Zurich, Karlsruhe Institute of Technology and Southern California Previous Research Physics Engine : J. Gunther, Saarland University, Christer Ericson Santa Monica Studios, Bruce Naylor Spatial Labs Inc and Eric Lengyel Course Technology PTR Lectures at the University of Surrey, Computer Graphics, Prof.Adrian Hilton, CVSSP

  4. SYSTEM MODEL OPENGL RENDERING PIPELINE

  5. OPENGL RENDERING PIPELINE Top-down approach to research: Pipeline overview, comparisons and application of techniques. OpenGL platform independent graphics API that converts a sequence of vertex attributes to (u,v) screen co- ordinates for displaying in the frame buffers. Vertex Pre- Processing Vertex Post- Processing Projected Patches Vertex Attributes Primitive Sequence Vertex Processing Projected Primitives Fragments Ordered Base Primitives Fragment Shader Rasterisation Primitive Assembly Coloured (u,v) Fragments Pixels Per-Pixel Sampling

  6. VERTEX PRE-PROCESSING Vertex array object (VAO) provides input attributes via drawing command. Input stream of vertex attributes describing rendering behaviours {Vertex, Normals, Texture (u,v), Colour, Indices} Stream in order of precedence. Buffer objects formats and binds data stream into bidirectional-map of index versus attributes for unique contiguous memory VAO wraps buffer objects and states for retrieving buffered attributes VBO uses indices to query attributes in the current buffer object and renders primitive based on order and type specification from drawing command

  7. VERTEX PROCESSING Per-vertex processing stage Vertex shading of packed vertex attributes Tessellating vertex patches into continuous base primitives and defining transformed vertex properties Geometry shading primitives into one or more outputs for layered rendering and transform feedback Geometry Shader (GS) TES output/drawing command Post-transform vertices Vertex Shader (VS) Tessellation Enable TES Bypass TCS N Buffer objects Primitive Generator Tessellation Control Shader (TCS) Tessellation Evaluation Shader (TES) Transform Feedback

  8. VERTEX SHADER Per-Vertex operations - Input buffer provides vertex attributes in order of precedence via drawing command. Projects attributes to post-transform space and colours vertices based on input attributes May compute local transforms (affine) and vertex lighting with ray tracing Invariant 1:1 mapping of input to output Unique vertex attributes have one projected outcome 1 Execution per unique input Advantage: Post-Transform Cache uses frequency invocation to reduce copy operations and re-computations. Outputs post-transformeddata and render states into VAO and vertex indices through VBO. {1,y1,z1} {u1,v1} {r1,g1,b1} {1,yn,zn} {un,vn} {rn,gn,bn} Write Attribute Cache Post-Transform Cache Attribute Cache Query Index Get Attributes Failed = Compute Write o/p Idx = [1, N] Vertex Shading Execution Time

  9. VERTEX PROCESSING - TCS Goal is to prepare patches in post-transform space to sub-divide patches into continuous primitives Patches are independent general purpose primitives of N vertices ( N > 32 ) QA stage for tessellation algorithms Optional Pre-evaluate primitive count, vertex ordering and level of tessellation Define tessellation configuration Number of inner/outer tessellation levels Single patches of different levels create discontinuities on edges Independent control over inner patch tessellation TCS invocation manipulates patches to fit configuration Patch size Patch inner/outer levels Number of vertices per patch

  10. VERTEX PROCESS - GENERATOR Decides number of vertices to create, primitive generation and ordering by sub-dividing input patches based on the tessellation configuration TES must be active Configuration Primitive type Level = 3 Nearest neighbour distance Primitive precedence and order of creation Tessellates edges and sub-divisions with different algorithms based on TCS configuration Abstract patches with normalised vertex positions - range(0, 1) Sub-divides into N segments - N = tessellation level Level = 1 E.g. triangulation has outer (4Vec) = 3, inner (2Vec) = 1 Heuristic: Discard primitives (outer = 0 or vertex = NaN) Heuristic: No sub-division (outer = 1) Edge spacing is configurable Outer level clamps to effective tessellation level, f = ceil[0, max] to preserve primitive type Equal : Equal segment lengths Fractal Level = 4 Clamp = ceil(4, 2^2n) Level = 2 (Level-2) subdivisions of equal length Two symmetrical and opposing edges where | Edge | = level f

  11. VERTEX PROCESS - TES Goal is to evaluate the input primitive vertices into usable attributes. Mandatory for tessellation Derive new vertex properties from patches and normalised co-ordinates Interpolate vertices in post-transform cache based on local normalised position on patch 1:1 mapping from normalised to defined vertex attributes Output Tessellation Bypassed : Vertex precedence Ordered stream of base primitives with well-defined vertices V0 Query post-transform cache Vertex List V5 V5 = nx * (V2 V1) + ny * (V0 V3) Level = 4 Normalised Indices (nx, ny) Patch V2 V1

  12. VERTEX PROCESSING GEOMETRY SHADER (GS) Emits more/less primitives for arbitrary topologies and format output into serial vertex stream for N buffers. Transform Feedback - Records processed vertices in memory (atomic operations) Data independent of latter rendering stages (prevents repeated execution) Layered rendering Generation of different images of same size Example : mipmap requires downsampled resolution Similar input/output format TES output or primitive type from drawing command and vertex stream Format vertices based on template primitive type Topology retained through precedence (and subsequent primitive ordering) Vertex count GS is 1:many Instances create independent outcomes for same input Iterates over instances (< 32) then primitives before vertex packing (I0,P0), (I1,P0), (I0,P1), (I1,P1) Open-ended topologies discarded

  13. VERTEX POST-PROCESSING Goal is to transform vertices from the scene to camera view Primitive clamping clips homogenous co-ordinates (x, y, z, w = 1) within the camera frustum Camera is modelled as pin-point camera with rendering ROI (frustum) Orthographic : -wc < axis < wc clipping Perspective : Depth clamping between near-far views Perspective divide normalises vertices within clipping space (x, y, z) -> (x/w, y/w, y/z) Viewpoint transform projects post-transform vertices to the camera co-ordinates for rendering Camera transform, C includes field of view, position and near/far C = * R

  14. PRIMITIVE ASSEMBLY AND RASTERISATION Primitive assembly converts patch stream to base primitives TES = True : Executes in tessellation Tessellation of GS output from memory cache backface culls : sign(normal) or reversed draw order Rasterisation converts vertex space into (u,v) fragments where the fragment resolution >= pixel density Interpolates primitive vertices to populate fragments with primitives colour (barycentric method) Discard fragments with pre-fragment test (drop pixel for speed-up) Capable of sub-pixel resolution for precision Additional sub-division and interpolation Fragment output as a structure {position, stencil value, GS = True {Layer idx, Viewpoint idx} }

  15. FRAGMENT SHADER (Optional) Per-pixel processing of active fragments into set of (u,v) colour and depth values using interpolation. Appends colour and depth information onto fragment output structure Invariant : Fragments I/O is1:1 Bypassed = No cell position or colour (e.g. shadow mapping debugging) Optimisation : Executes per-sample processing tests early to drop fragments Implicit request : Discard for no shader execution, buffer writes and image stores. Explicit request : Still updates output buffers with fragment state Fragments Stream of passed fragments + (r,g,b) + depth (z) Stream of passed fragments

  16. PER-SAMPLE PROCESSING Per-sample tests/post-processes fragments and writes to output buffers OpenGL does not own frame buffer Pixel Ownership : Test windowed pixels and discard any not owned by openGL process Tests are done per fragment by comparing previous passed fragment array and new array Scissor Test : Render pixels within ROI Stencil Test : Discard if old ^ new masks != 1 Depth Test : Occlusion test which drops fragments if old(Z) > new(Z) Passed all : Occlusion Query sets fragment state = True Write mask determines whether write enabled for a buffer Buffer state = False - do not write Blending may be enabled to sum fragment properties between the old and new arrays sRGB default May be conditional using logic operators

  17. SYSTEM MODEL RENDERING PIPELINE COMPARISON

  18. DIRECTX 11 RENDERING PIPELINE DirectX shares many I/O protocols and control with openGL Order of rendering Vertex packing, streaming and buffer objects Transform feedback and caching Vertex processing modules differ in operation TCS replaced by Hull Shader Concurrent tessellation level and patch control definitions TES replaced by Domain Shader Retains caching GS output via Stream Output Stage Retains piping into Input-Assembler Adds load functionality for shader modules Implementation different e.g. Shortcuts for returning data without size queries Rasterisation wraps vertex post-processing operations Pixel Shader is Fragment Shader Output-Merger Stage is Per-Sample Processing

  19. DIRECTX 11 VERTEX OPERATIONS Vertex processing steps differ in I/O protocols but retains most operational features Added : Input-Assembler and GS can accept adjacent-edges for primitives to construct arbitrary topologies GS = False : Bypass vertex processing Dummy vertices = adjacent-edges (degenerate primitive) Vertex Shader can perform texture sampling operations Must have no screen (u,v) dependencies Concurrent Hull Shader replaces TCS Control Point Phase : Continued control point definitions for patches ( < 32 ) Control points independent of tessellation level : Allows more selection over tessellations partitioning algorithm Patch Constant Phase : Continued tessellation level definitions for continuity Domain Shader replaces TES Retains definition of vertex attribute from tessellated patches [ 1:1 mapping ] Active : Transparent input adjacency vertices Use of template primitives over topologies More user control

  20. DIRECTX 11 VERTEX OPERATIONS Geometry Shader operates on full primitives like openGL Same freedom in emitted primitives for topologies Invocations < 32 Additional input : Edge-adjacent primitives via Input-Assembler for user topologies Stream Output Stage has more piping functionality Shaders loading data explicitly Feedback into Input-Assembly e.g. relative camera transform without re-computing everything

  21. DIRECTX 11 PIXEL OPERATIONS Rasterisation retains operation but adds vertex post-processing and colouring via pixel shading Frustum culling Perspective divide Viewport transforms : Homogenous post-transform vertices (x,y,z,w) to viewport (u,v) Pixel Shading retains per-pixel colouring and post-processing Pixel shader invoked by rasteriser per pixel, per layer (for multisampling grids) Combines interpolation, texture sampling (u,v dependent) and per pixel lighting Colouring applied to pixel centres not bottom-right rule. Texture sampling operates on derivatives i.e. anisotropic filtering finds partial derivatives of anisotropy axis for interpolating mipmaps Output-Merger Stage has same functionality Combines fragment outputs and per-sample tests to enable/discard fragments

  22. SYSTEM MODEL RASTERISATION TECHNIQUES

  23. CULLING Optimisation process for discarding geometry after viewpoint transform Various types of culling for vertex discarding and clipping Frustum : Out of camera frustum bounds (-wc < axis < wc) Occlusion : Hidden vertices at camera viewpoint (new(z) < old(z) = new(z) ) Small Object : Primitive dimensions < fragment resolution = drop Backface : Primitives whose backs are facing the viewpoint Degenerate : Shape area is 0 (co-linear points) Backface culling (Primitive Assembly) v1 Index precedence : reverse orientation = back e.g. {v2 v1 v0} vs {v0 v1 v2} +n Normal orientation : sign( V1 x V2 / | V1 x V2 | ) > 0 Degenerate culling tests if points are equal or co-linear -n (1/N) (x,y) == (x,y) || count < 2 v0 Line . Vertex = 0 v2

  24. CULLING Culling difficult for objects with high vertex density Approximate object capsule as OBB for prior comparison Slice with mask if partial overlap Alternative : Hierarchal bounding volume trees construct hierarchy of OBB based on vertex density [1] Advantage : log(n) search for ray tracing, culling and physics Compare OBBs from root to leafs - State propagates upwards If : All leaf capsule passes => return Else : Identify lowest levels between true/false state and compare vertex contents for slicing root Leaf density = 1 primitive Test vertices and cull segment True False False True False [1] J. Gunther, Realtime Ray Tracing on GPU with BVH-based Packet Traversal , Saarland University, IEEE, 2007

  25. RASTERISATION Process of populating fragments with colour and depth information Repetition and fragment independence = GP-GPU problem Old : Rasterise edges and flood fill between edge pixels New : Rasterise fragments based on uniform partition and discard cells Primitive vertex inside cell and vice versa Primitive intersects with cell edges Project vertex (x,y,z) to fragment (u,v) using single matrix (64 numerical operations) Vertex (homogenous) fragment [u; v] = ( D P C M ) [x; y; z; 1] Modelview Image Space Clipping Viewpoint RenderingOrder in Pipeline

  26. VISIBILITY AND Z BUFFER Addresses object occlusion from pin-hole camera viewpoint Painters algorithm : Rasterise objects in reverse order Disadvantage : Primitive sorting (like raycasting) and overdrawing Depth test (dedicated GPU memory) Compare primitive depth values along look-at ray trace Store attribute at minimum in fragment Alternative : Binary spatial partition scene and paint from back to front (Quake/Epic Games Engine) [2] Advantage : Z-buffer and collision combined Partition geometry along scene centre and store wall in tree Locate camera region (binary search) Traverse based on distance to camera Render wall if first node, or if wall is in unoccupied space [2] W. Thibault, Set operations on polyhedra using binary space partitioning trees , SIGGRAPH 1987, ACM New York, pp. 153 162

Related