--- title: The Graphics Pipeline date: "April 20 - 2025" --- Ever wondered how games put all that gore on your display? All that beauty is brought into life by a process called **rendering**, and at the heart of it, is the **graphics pipeline**. In this article we'll dive deep into the intricate details of this powerful beast. We'll cover all the terminologies needed to understand each stage and have many restatements so don't worry if you don't fully grasp something at first. If you still had questions, feel free to contact me :) So without further ado--- ## Overview Like any pipeline, the **graphics pipeline** is comprised of several **stages**, each of which can be a pipeline in itself or even parallelized. Each stage takes some input (data and configuration) to generate some output data for the next stage. Application --> Geometry Processing --> Rasterization --> Pixel Processing --> Presentation Before the heavy rendering work starts on the Graphics Processing Unit, we simulate and update the world through **systems** such as physics engine, game logic, networking, etc. during the **application** stage. This stage is mostly ran on the Central Processing Unit, therefore it is extremely efficient on executing A type of execution flow where the operations depend on the results of previous steps, limiting parallel execution. In other words, **CPUs** are great at executing **branch-heavy** code, and **GPUs** are geared towards executing a TON of **branch-less** or **branch-light** code in parallel---Like executing some code for each pixel on your screen, there are a ton of pixels but they mostly do their own independent logic. . The updated scene data is then prepped and fed to the **GPU** for **geometry processing**. Here we figure out where everything ends up on our screen by doing lots of fancy linear algebra. We'll cover this stage in depth very soon so don't panic (yet). Afterwards, the final geometric data are converted into Pixel is the shorthand for **picture-element**, Voxel is the shorthand for **volumetric-element**. and prepped for the **pixel processing** stage via a process called **rasterization**. In other words, this stage converts a rather abstract and internal presentation (geometry) into something more concrete (pixels). It's called rasterization because end the product is a Noun. A rectangular pattern of parallel scanning lines followed by the electron beam on a television screen or computer monitor. -- 1930s: from German Raster, literally ‘screen’, from Latin rastrum ‘rake’, from ras- ‘scraped’, from the verb radere. ---Oxford Languages of pixels. The **pixel processing** stage then uses the rasterized geometry data (pixel data) to do **lighting**, **texturing**, and all the sweet gory details of a scene (like a murder scene). This stage is often, but not always, the most computationally expensive. A huge problem that a good rendering engine needs to solve is how to be **performant**. And a great deal of **optimization** can be done through **culling** the work that we can deem unnecessary/redundant in each stage before it's passed on to the next. More on **culling** later so don't worry (yet :D). The pipeline will then serve (present) the output of the **pixel processing** stage, which is a **rendered image**, to your pretty eyes using your Usually a monitor but the technical term for it is the target **surface**. Which can be anything like a VR headset or some other crazy surface used for displaying purposes.. But to avoid drowning you in overviews, let's jump right into the gory details of the **geometry processing** stage and have a recap afterwards! ## Surfaces Ever been jump-scared by this sight in an First person (shooter) perspective? Why are (the inside of) things rendered like that? In order to display a (murder) scene, we need to have a way of **representing** the **surface** of its composing objects (like corpses) in computer memory. We only care about the **surface** since we won't be seeing the insides anyway---Not that we want to. At this stage, we only care about the **shape** or the **geometry** of the **surface**. Texturing, lighting, and all the sweet gory details come at a much later stage once all the **geometry** has been processed. But how do we represent surfaces in computer memory? ## Vertices There are several ways to **represent** the surfaces of 3d objects for a computer to understand. For instance, **Non-uniform rational basis spline** is a mathematical model using **basis splines** (B-splines) that is commonly used in computer graphics for representing curves and surfaces. It offers great flexibility and precision for handling both analytic (defined by common mathematical formulae) and modeled shapes. ---Wikipedia surfaces are great for representing **curves**, and it's all about the **high precision** needed to do Computer Assisted Design. We could also do **ray-tracing** using fancy equations for rendering **photo-realistic** images. These are all great---ignoring the fact that they would take an eternity to process... But what we need is a **performant** approach that can do this for an entire scene with hundreds of thousands of objects (like a lot of corpses) in under a small fraction of a second. What we need is **polygonal modeling**. **Polygonal modeling** enables us to do an exciting thing called **real-time rendering**. The idea is that we only need an **approximation** of a surface to render it **realistically enough** for us to have some fun killing time! We can achieve this approximation using a collection of **triangles**, **lines**, and **dots** (primitives), which themselves are composed of a series of **vertices** (points in space). A **vertex** is simply a point in space. Once we get enough of these **points**, we can connect them to form **primitives** such as **triangles**, **lines**, and **dots**. And once we connect enough of these **primitives** together, they form a **model** or a **mesh** (that we need for our corpse). With some interesting models put together, we can compose a **scene** (like a murder scene :D). But let's not get ahead of ourselves. The primary type of **primitive** that we care about during **polygonal modeling** is a **triangle**. But why not squares or polygons with a variable number of edges? ## Why Triangles? In Developed by **Euclid** around 300 BCE, is based on five axioms. It describes properties of shapes, angles, and space using deductive reasoning. It remained the standard model of geometry for centuries until non-Euclidean geometries and general relativity showed its limits. It's still widely used in education, engineering, and **computer graphics**. ---Wikipedia , triangles are always **planar** (they exist only in one plane), any polygon composed of more than 3 points may break this rule, but why does polygons residing in one plane so important to us? When a polygon exists only in one plane, we can safely imply that **only one face** of it can be visible at any one time; this enables us to utilize a huge optimization technique called **back-face culling**. Which means we avoid wasting a ton of **precious processing time** on the polygons that we know won't be visible to us. We can safely **cull** the **back-faces** since we won't be seeing the **back** of a polygon when it's in the context of a closed-off model. We figure this out by simply using the **winding order** of the triangle to determine whether we're looking at the back of the triangle or the front of it. Triangles also have a very small **memory footprint**; for instance, when using the **triangle-strip** topology (more on this very soon), for each additional triangle after the first one, only **one extra vertex** is needed. The most important attribute, in my opinion, is the **algorithmic simplicity**. Any polygon or shape can be composed from a **set of triangles**; for instance, a rectangle is simply **two coplanar triangles**. Also, it is a common practice in computer science to break down hard problems into simpler, smaller problems. This will be a lot more convincing when we cover the **rasterization** stage :) present-day **hardware** and **algorithms** have become **extremely efficient** at processing triangles by doing operations such as sorting, rasterizing, etc, after eons of evolving around them. ## Primitive Topology So, we got our set of vertices, but having a bunch of points floating around wouldn't make a scene very lively (or gory), we need to form **triangles** out of them to compose **models** (corpse xd). We communicate to the computer the The way in which constituent parts are interrelated or arranged.--mid 19th century: via German from Greek topos ‘place’ + -logy.---Oxford Languages of the primitives to be generated from our set of vertices by configuring the **primitive topology** of the **input assembler**. We'll get into the **input assembler** bit in a second, but let's clarify the topology with some examples. When the topology is **point list**, each **consecutive vertex** (v) defines a **single point** primitive (p) and the number of primitives (n_p) is equals to the number of vertices (n_v). ```math \begin{aligned} &p_i = \{ v_{i} \} \\ &n_p = n_v \end{aligned} ``` When the topology is **line list**, each **consecutive pair of vertices** defines a **single line** ```math \begin{aligned} &p_i = \{ v_{2i},\ v_{2i+1} \} \\ &n_p = ⌊ n_v / 2 ⌋ \end{aligned} ``` When the primitive topology is line strip, **one line** is defined by each **vertex and the following vertex**, according to the equation: ```math \begin{aligned} &p_i = \{ v_i, v_{i+1} \} \\ &n_p = \text{max}(0, n_v - 1) \end{aligned} ``` When the primitive topology is triangle list, each **consecutive set of three vertices** defines a **single triangle**, according to the equation: ```math \begin{aligned} &p_i = \{ v_{3i}, v_{3i+1}, v_{3i+2} \} \\ &n_p = ⌊n_v / 3⌋ \end{aligned} ``` When the primitive topology is triangle strip, **one triangle** is defined by each **vertex and the two vertices that follow it**, according to the equation: ```math \begin{aligned} &p_i = \{ v_i,\ v_{i + (1 + i \bmod 2)},\ v_{i + (2 - i \bmod 2)} \} \\ &n_p = \text{max}(0, n_v- 2) \end{aligned} ``` When the primitive topology is trinagle fan, triangleas are defined **around a shared common vertex**, according to the equation: ```math \begin{aligned} &p_i = \{ v_{i+1}, v_{i+2}, v_0 \} \\ &n_p = \text{max}(0, n_v - 2) \end{aligned} ``` ## Indices Indices are an array of integers that reference vertices in a vertex buffer. They define the order in which vertices are used to form primitives (triangles, strips, etc.), allowing vertex reuse and reducing memory usage. Instead of duplicating vertex data, indices let you build complex geometry efficiently. ## **Input Assembler** Every section before this explained terminologies needed to grasp this, section colored in yell-ow are concrete pipeline stages where some code gets executed which processes the data we feed to it based on the configurations we set on it. The **vertices** and **indices** are provided to this stage via something we call buffers. So technically we have to provide **two** buffers here, a **vertex buffer** and a **index buffer**. To give you yet-another ovreview, this is the diagram of the **geometry processing** section of our pipeline: Draw --> Input Assembler -> Vertex Shader -> Tessellation Control Shader -> Tessellation Primitive Generator -> Tessellation Evaluation Shader -> Geometry Shader -> Vertex Post-Processing -> ... Rasterization ... ## Coordinate System -- Overview We got our surface representation (vertices), we got our indices, we set the primitive topology type, and we gave these to the **input assembler** to spit out triangles for us. **Assembling primitives** is the **first** essential task in the **geometry processing** stage, and everything you read so far only went over that part. Its **second** vital responsibility is the **transformation** of the said primitives. Let me explain. So far, all the examples show the geometry in NDC (Normalized Device Coordinates). This is because the **rasterizer** expects the final vertex coordinates to be in the NDC range. Anything outside of this range is **clipped** henceforth not visible. Yet, as you'll understand soon, doing everything in the **NDC** is inconvenient and very limiting. What we'd like to do is to transform these vertices through 5 different coordinate systems before ending up in NDC (or outside of if they're meant to be clipped). The purpose of each space will be explained shortly. But doing these **transformations** require a lot of **linear algebra**, specifically **matrix operations**. I'll give you a brief refresher on the mathematics needed to understand the coordinate systems. But if you feel extra savvy you may skip the following **linear algebra** sections. The concepts in the following sections may be difficult to grasp at first. And **that's okay**, you don't need to pickup everything the first time you read them (I didn't). If you feel passionate about these topics and want to have a better grasp, refer to the references at the bottom of this article and **take your time** :) ## Linear Algebra --- Vector Operations ** What is a vector** **Additions and Subtraction** **Division and Multiplication** **Scalar Operations** **Cross Product** **Dot Product** **Length** **Normalization and the normal vector** ## Linear Algebra --- Matrix Operations ** What is a matrix** **Addition and Subtraction** **Scalar Operations** **Multiplication** **Division (or lack there of)** **Identity Matrix** ## Linear Algebra --- Affine Transformations All **affine** transformations can be represented as matrix operations using **homogeneous** coordinates. **What is transformation** **Scale** **Rotation** **Translation** Why are we using 4D matrixes for vertices that are three dimensional? **Embedding it all in one matrix** Great! You've refreshed on lots of cool mathematics today, let's get back to the original discussion. **Transforming** the freshly generated **primitives** through the **five** primary coordinates systems (or spaces), starting with the **local space**! ## Coordinate System -- Local Space Alternatively called the **object space**, is the space **relative** to your object's **origin**. All objects have an origin, and it's probably at coordinates [0, 0, 0] (not guaranteed). Think of a modelling application like **Blender**. If you create a cube in it and export it, the **vertices** it outputs is probably something like this: **insert outputted vertices**. And the cube looks plain like this: I hope this one is easy to grasp since **technically** been using it in our initial triangle and square examples already, the local space just happened to be in NDC though that is not necessary. Say if we arbitrarily consider each 1 unit is 1cm, then a 10m x 10m cube would have the following vertices whilst in the local space. Basically the vertices that are read from a model file is initially in local space. ## Coordinate System -- World Space This is the where our first transormation happens. If we were constructing a crime scene without world space transformations then all our corpses would reside somewhere in [0, 0, 0] and would be inside each other (horrid, or lovely?). This transformation allows us to **compose** a (game) world, by transforming all the models from their local space and scattering them around the world. We can **translate** (move) the model to the desired spot, **rotate** it because why not, and **scale** it if the model needs scaling (capitan obvious here). This transformation is stored in a matrix called the **model matrix**. This is the first of three primary **transformation** matrices which gets multiplied by our vertices. ```math \text{model}_M * \text{local}_V ``` So one down, two more to go! ## Coordinate system -- View Space Alternatively names include: **eye space** or the **camera space**. This is where the crucial element of **interactivity** comes to life (well depends if you can move the view in your game or not). Currently, we're looking at the world through a fixed lens. Since everything that's rendered will be in the [-1.0, 1.0] range, that means **moving** our selves or our **eyes** or the game's **camera** doesn't have a real meaning. Now it's you that's stuck! (haha). But don't worry your layz-ass, instead of moving yourself (which again would not make sense since everything visible ends up in the NDC), you can move the world! (how entitled). We can achieve this illusion of moving around the world by **reverse transforming** everything based on our own **location** and **orientation**. So imagine we're in the [+10.0, 0.0, 0.0] coordinates. How we simulate this movement is to apply this translation matrix: baa ** Position ** ** Orientation ** We can **rotate** the camera, or more accurately **reverse-rotate** the world, via 3 unit vectors snuggled inside a matrix, the **up** vector (U), the **target** or **direction** vector (D) and the **right** vector (R) ```math \begin{bmatrix} \color{red}{R_x} & \color{red}{R_y} & \color{red}{R_z} & 0 \\ \color{green}{U_x} & \color{green}{U_y} & \color{green}{U_z} & 0 \\ \color{blue}{D_x} & \color{blue}{D_y} & \color{blue}{D_z} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & -\color{purple}{P_x} \\ 0 & 1 & 0 & -\color{purple}{P_y} \\ 0 & 0 & 1 & -\color{purple}{P_z} \\ 0 & 0 & 0 & 1 \end{bmatrix} ``` ">>>>>" explain in depth why such operation makes the view rotate. Just like the **world space** transformation which is stored in the **model matrix**. This transformation is stored in anoher matrix called the **view matrix**. So far we got this equation to apply the **world space** and **view space** transformations to the **local space** vertices of our model: ```math \text{model}_M * \text{view}_M * \text{local}_V ``` That's two down, one left to slay! ## Coordinate system -- Clip Space **Overview*** **Aspect Ratio*** **Field of view*** **Normalization*** **Putting it all together** ```math \text{model}_M * \text{view}_M * \text{projection}_M * \text{local}_V ``` ## Coordinate system -- Screen Space ** Viewport transform ** ## Coordinate system -- Recap ## Vertex Shader ## Tessellation & Geometry Shaders ## Geometry Processing -- Recap ## Rasterization Remember the god forsaken **input assembler**? Let's expand our understanding of it since-- for simplicity's sake, we skipped over the fact that **vertices** can hold much, much more data than only positions. ## Pixel Processing ## Output Merger ## The Future ## Conclusion ## Sources MMZ ❤️ [Tomas Akenine Moller --- Real-Time Rendering](https://www.realtimerendering.com/intro.html) [Gabriel Gambetta --- Computer Graphics from Scratch](https://gabrielgambetta.com/computer-graphics-from-scratch/) [JoeyDeVriez --- LearnOpenGL](https://learnopengl.com/) [Polygonal Modeling](https://en.wikipedia.org/wiki/Polygonal_modeling) [Non-uniform Rational B-spline Surfaces](https://en.wikipedia.org/wiki/Non-uniform_rational_B-spline) [Computer Aided Design (CAD)](https://en.wikipedia.org/wiki/Computer-aided_design) [Rasterization](https://en.wikipedia.org/wiki/Rasterisation) [Euclidean geometry](https://en.wikipedia.org/wiki/Euclidean_geometry) [Miolith --- Quick Understanding of Homogeneous Coordinates for Computer Graphics](https://www.youtube.com/watch?v=o-xwmTODTUI) [Leios Labs --- What are affine transformations?](https://www.youtube.com/watch?v=E3Phj6J287o) [3Blue1Brown --- Essence of linear algebra (highly recommended playlist)](https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab) [3Blue1Brown --- Quaternions and 3d rotation, explained interactively](https://www.youtube.com/watch?v=zjMuIxRvygQ) [pikuma --- Math for Game Developers (playlist)](https://www.youtube.com/watch?v=Do_vEjd6gF0&list=PLYnrabpSIM-93QtJmGnQcJRdiqMBEwZ7_) [pikuma --- 3D Graphics (playlist)](https://www.youtube.com/watch?v=Do_vEjd6gF0&list=PLYnrabpSIM-97qGEeOWnxZBqvR_zwjWoo) [Cem Yuksel --- Introduction to Computer Graphics (playlist)](https://www.youtube.com/watch?v=vLSphLtKQ0o&list=PLplnkTzzqsZTfYh4UbhLGpI5kGd5oW_Hh) [javidx9 --- Essential Mathematics For Aspiring Game Developers](https://www.youtube.com/watch?v=DPfxjQ6sqrc) [Stackoverflow --- Why do 3D engines primarily use triangles to draw surfaces?](https://stackoverflow.com/questions/6100528/why-do-3d-engines-primarily-use-triangles-to-draw-surfaces) [The ryg blog --- The barycentric conspiracy](https://fgiesen.wordpress.com/2013/02/06/the-barycentric-conspirac/) [Juan Pineda --- A Parallel Algorithm for Polygon Rasterization](https://www.cs.drexel.edu/~deb39/Classes/Papers/comp175-06-pineda.pdf) [Kristoffer Dyrkorn --- A fast and precise triangle rasterizer](https://kristoffer-dyrkorn.github.io/triangle-rasterizer/) [Vulkan Docs --- Drawing](https://docs.vulkan.org/spec/latest/chapters/drawing.html) [Vulkan Docs --- Pipeline Diagram](https://docs.vulkan.org/spec/latest/_images/pipelinemesh.svg)