Let's face it, if you want to understand modern 3D programming you need to understand shaders. That should be enough to entice even the laziest programmers, since 3D graphics is one of the sexiest types of programming. Actually, I assume that porn applications would be the sexiest, but I digress. Strangely, I've found many very good programmers struggle to understand the basics of shaders. Beyond that, it took me an incredible amount of time to finally grok the fairly simple concepts.
And they are fairly simple. The only problem is there are several; stacked on top of each other and if you screw up one tiny piece, nothing works. I mean anything from a blank screen to your items being pure white or strange technocolor nightmares which haunt your waking moments, making you question your very worth as a programmer. I'm assuming that's what you do since I have no experience in that level of frustration. Anyway, after you recover from the resulting psychotic break, you eventually find some clue as to what you did wrong and start the process over again. My purpose is to save you thousands in doctor's and lawyer's fees.
Why does it seem so hard to learn?
I think we've been teaching this whole idea backwards. This is a different approach which is closer to how I think about things. So, if you are ready to try ONE more time, I hope I can help you make sense of the insanity which are shaders.
Note, I use OpenGL. The concepts are about the same, but if there are differences you should be able to find them afterwards. Second, I'm not showing any real code. Implementation isn't actually that important since you will finally get so tired of the massive amount of code you will write your own libraries. Or if you are experienced and humble, use someone else's.
Let's start at the beginning: A pixel.
A single pixel on the screen. That's simple enough, right? Well, it's a color which we represent as three values, usually a floating point value from 0 to 1. The values are red, green and blue.
That's all there is to light. So (1.0, 1.0, 1.0) would be white. And (0.0, 0.0, 0.0) would be black, (0.0, 1.0, 0.0) green, etc. We call a list of numbers a *vector*, because we need to call it something and vectors sound cool.
Advanced programmers, please stay with me here. In the olden days (think the age of VGA), you would set the screen's resolution (how many wide and high) and then you would simply write to a memory location. So putting a pixel on the screen was as easy as writing to memory addresses. It's the same as putting values in your own variables, once you get used to it.
So that's how we used to do things. Your program, however, had to write to all the pixels on the screen. At least the ones that changed, but finding which ones changed and which didn't took longer to compute than just drawing the pixels. So you have your poor little processor, a single core back then, trying to push pixels and run the rest of your system including the rest of your graphics app.
In games, you have a lot more to do than just push pixels. Ideally you don't want the processor to do *any* rendering at all.
Moreover, you want a completely different processor to handle the drawing. A processor made exclusively for drawing stuff quickly. Actually, you really want a bunch of processors to do it for you. Better yet, you want a processor for each pixel! Or even more so you can queue them up for the next write.
So some smart guys created the graphics card. And it was good. But we programmers had to have a way to talk to the card. But there wasn't a standard way to communicate with the graphics cards, so you either wrote for each card separately or just used it as a glorified VGA card. Finally all the vendors (sort of) got their act together and figured out a general interface to use the massive power of these cards. So what did they change? Well, from the screen's perspective, nothing. It still got a list of pixels which had 3 values each.
Again, we will look at one pixel. Just pick a pixel on the screen. I like the upper left one, but it can be any pixel, like the one that's stuck a slightly different color, right next to your cursor. The one that drives you mad, like a splinter in the mind's eye. The one that would be fine if you could just stop looking at it! Come to think of it, I'll use that one.
That pixel has a secret. It's been hiding an entire graphics processor behind it. Every single one of your pixels has one, but this one is for it alone. This complete computer processor's only purpose is to spit 3 numbers to your pixel on the screen. Let me repeat that, every PIXEL HAS A PROCESSOR. And they changed the name from pixel to fragment. And since it mostly "shades" your screen (I don't get it either), this processor's program is called *fragment shader*.
So what's the point of that? You are just giving it a few colors already, right? Well you *can* do that.
But you can also do more complex things like sending it several different values and having your fragment's processor figure out the math. You're main processor could do it, but on an 800x600 it would need to do it 480000 times. And that's low resolution these days.
So good, we have a processor for each and they can do math. But the curious might wonder, "Wouldn't writing the data to each processor take almost as much time?" And they would be right!
Ideally we would like to describe a scene in simple terms, hand it over to the Graphics Processing Unit (GPU) and be done with it. So they also did that. So now we have a processor behind all your fragments which decodes a simplified description and turns it over to the thousands of fragments.
But what exactly do we send to this extra processor?
In fact, that's ALL you can draw. Everything you've seen in every game is a triangle. You might enter them as squares, which are two triangles. Or cubes, which are 12. Or spheres which can be any number of triangles. Or a dog, which is some huge number.
What? How can this be? They look smooth! What's sorcery is this! We'll get to that another time. Right now I need to focus on a single triangle. A single triangle is pretty cool. You can cover as much or as little of the screen as you want with one. And all you need is 3 vertexes or a vector with 3 values. In 3D you need three values for each, leaving us 9 values total for a triangle. That's way better than trying to blast the screen with possibly thousands of color values. In fact, if the fragment isn't inside the triangle then it doesn't bother sending it. We can just clear the screen and only draw what we need.
And now we need more processors, one for each corner (or vertex) of the triangle. And these processors, which hand values to the fragment shaders, are called *vertex shaders*. They only take vertexes and the hidden parts of the processor takes their output, creates triangles, and sends the triangle data to fragment shaders. So we just create a list of triangles and, if we like, lists of values to help the draw them, such each vertex's color. The vertex shader plays with those values, one of each vertex, and then uses the fragment shaders to draw the triangle itself. Pretty clever, huh?
So let's recap:
- A vertex shader only a single corner (vertex) of a triangle.
- A vertex shader takes several list's (or arrays) of data, puts it in a digital blender. (Or *transforms* it.)
- The GPU then takes 3 of these outputs and creates a triangle which it then sends to the appropriate fragment shaders.
- Each fragment shader has it's own blender (Which also *transforms* the data)
- Finally, the fragment shader shoots the result to the screen.
That wasn't so bad was it? It's a big multiplexer. And don't bother looking to see if your GPU has enough cores to give each fragment its own, it doesn't matter. They are virtual, but it looks like there is one for each fragment and vertex.
And Now for Something Completely Different.
In OpenGL, the lower left corner is 0,0 and the upper right is 1,1. If I draw a triangle say from 0,0 to 0,0.5 to 0,0.5, I will end up with a triangle covering the bottom left hand side of the screen.
About a quarter of the screen will be covered. Which is good, if you are into that sort of thing. But unless you have a perfectly square screen, the left and bottom sides will be a different length. That's not a problem yet, but when you start rotating triangles it will be. It will distort them horribly.
When you show it to people they will look at it, tell you it looks bad and you should feel bad. People will laugh at you in the streets. You will lose all your friends and family. Soon you start binge drinking Red Bull and coffee. Before you know it you wake up in a trailer park in Minnesota surrounded by five very unhappy Norwegians.
So let's not do that.
How can we make our fragments square again? Well, we could multiply the short side by a larger number. Or divide the long side by the inverse of that number. That means my triangle will be transformed slightly. If we divide the long side we will have a triangle which is the same height and width. Now do we do that in the computer? Well, we could modify every vertex in main CPU before we send it to the graphics card. Or we can just give the GPU the list of changes we need and let the vertex shader transform it.
Let's do that! And we can even do more. I don't like the bottom corner being 0,0. I want 0,0 to be the center of the screen. That's also very simple, before we "square the fragments", we add 0.5 to each vertex. Now the triangle is shifted to the center of the screen.
And we can do more, we can rotate the vertexes, or multiply them to make the triangle bigger, or move it somewhere else. And that's a real pain to try to remember. You know what would be great? A simple way to describe these changes and pass them to the vertex shaders. Maybe as a bunch of equations? But that might get difficult to remember as well. Maybe we could apply some cleverness to the problem.
Well, I couldn't, but someone smarter than me figured out an elegant solution. They used grids of numbers in some really cool ways. 4x4 grids, to be precise. They are used to transform our vertex's location on the screen. In fact we call them transforms. They are a very special kind of transform called Linear Transforms. If you want to know about this secret sauce you can read up on Linear Algebra. For the sake of this tutorial, we will treat them like magic. I will describe what they do and you will just imagine that's what they do.
The first question is why did they pick 4x4 values? Well, that's the minimum we need to rotate, scale and translate (move) vertexes (and by extenuation triangles) in 3D space. Do I have your attention? This is how we move things around in 3D space. Remember, we can only rotate, scale, and translate objects.Also, I really hate the term translate because it's so close to transform so I'll just use move.
Now let's talk about math in general. As far as I can tell, mathematicians are trying to turn everything into high school algebra. Whenever they find something new they try to add it to something or multiply it with something. And much like different objects in software, different types of objects can be added or multiplied in different ways. A scalar (a normal number) can be multiplied against a row of numbers. Or a grid, it doesn't matter. The point is that each operation can either combine two things or, if we reverse the process, turn one thing into two. A quick example:
(4 * 5) + 2 = 20 + 2 = 22 // So by combining 2 things we can combine as many as we like.
44 = 11 * 4 = (7 + 4) * 4 = 28 + 16 = 44 // So one thing becomes several and then back again
This is called composition and decomposition. The important thing is that each of those representation are identical to a mathematician. They mean the same thing.
To us, some are more of a pain. With what we are doing, however, we can compose the previous operations (moving, scaling, rotating) in a series by multiplying matrix transforms. Then we can multiply our vertex (which is a vector) against it and get our new location. This is the majority of what a vertex shader does. It takes a list of these matrices and multiplies them with the vertex it's working with. And this little bit of magic can be explained somewhere else.
Mommy, Where Do Shaders Come From?
Well, shaders are programs. They take input and produce output. The only unique thing about them is they usually talk to multiple outputs. When you draw something, you select both a vertex and fragment shader. Only one of each. So if you draw a person as a single object then you have to make shaders which handle all the colors, bumpiness, shininess, etc which you need. Then you select your next object and do it again. You do this until your image is done and then you tell the GPU to print it to the screen. You might have dozens of shaders in a program, although that's rare.
Since shaders are programs, the people that invented the idea of shaders decided to make them like other programs. What kind of programs? C or C++ programs. And you have to do it without your fancy graphical interfaces (IDEs). So you need to know about the general idea of compiling code.
To start with, a compiler is a program that inputs source code and outputs executable code, or your end program. But, in order to make things simple it is broken up into pieces. There is a high level compiler, which turns your code into an object file, which is called compiling. The object file contains the machine code but not the library references, entry points or even your other object files which it will need when it runs. This is done on each component, usually a single file. Then these object files are combined into a program which is called linking. Only then can we run our program.
So we have the process: compiling, linking and running. Shaders do the same thing. First you compile your vertex shader. Then you compile you fragment shader. Then you link them together into a program. Then you save it until you need it. Finally, when you need it you run it.
So we take some text, usually a string, which details the steps for the vertex shader and send it to your video driver's shader compiler and get our first shader. We do the same with our fragment shader. With vertex and fragment shaders in hand, we stick them into a shader program. When we are ready to use the program, we enable it.
And that's the short of it. As you can see, it's a fairly large subject but this is only meant to give an overview so you know what questions to ask. So in the end, it's just a pipe from your CPU, which goes to several vertex shaders which in turn go to several fragment shaders which finally shoot colors to your screen, and into you. And oddly, all that's been going on if you are reading this on a computer screen. Or as Lao Tzu says, pipes within pipes; the gateway to all understanding about shaders.