## A (not) tedium work

Recently I started to port my project to DirectX 11, and it has a lot of interesting differences with OpenGL such like the coordinates and matrix convention, stronger type safety requirement, better shader resources management (I didn’t try OpenGL’s SSBO yet but the constant buffer is really easier to wrap into an elegant layer) and etc. One thing I stuck a little is the normal mapping there since I used the on-the-fly tangent generation in GLSL, it quite confuses me at first when I rewrote it in HLSL.

## Always Review

Basically the normal vector is interpreted as one unit **direction** vector who is perpendicular with a surface (or mathematically speaking in general, normal vector $\vec{n}$ is one gradient vector of the gradient $\nabla f = \left\langle {{f_x},{f_y},{f_z}} \right\rangle$ of a scalar field $f\left( {x,y,z} \right) = k$ in a vector space at a certain point $\left( {{x_0},{y_0},{z_0}} \right)$, who is orthogonal/**normal** always with the field), and with this surface normal data in hand we could achieve some old-style flat shading which only gives us some discrete results of the surface color. Later Gouraud (maybe not him) invented the vertex normal, as the average of the surface normals where the vertex is in, it could give a smoother shading result with the nature of interpolation in the pixel processing stage.

Typically we would get the precomputed normal vector of the models from those DCC tools such as Blender 3DS Max or Maya (or generated with some cross-product on CPU side by your own), and they are stored in the local space of the model. Then if we want to render anything with its help in our world space (strictly speaking finally in screen space), we need to transform these normal vectors.

The first idea is just to use the model’s local to world transformation matrix to transform the normal vector since it represents a direction, its w component is 0 in the homogeneous 4D space then the translation part (the last column of transformation matrix) won’t have any effect on it. And the rotation part is what we want to apply, but if we scaled an un-unified extension to the model, then the normal vector would be sheared too, the direction changes! So we need to figure out how to cancel the unwanted scaling only on normal vectors, multiple solutions here:

- Multiply with the inverse of the scale matrix $N_{ws} = S^{-1} * M * N_{ls}$;
- Without multiply the transformation matrix at the first, instead, just multiply a rotation matrix with local space normals $N_{ws} = R * N_{ls}$;
- A classic trick called “invert transpose normal matrix”, which means use the transpose of the inverse of the model’s local to world transformation matrix to multiply with the local space normals $N_{ws} = M^{-1T} * N_{ls}$. Since it looks less intuitive, you may ask why this works, well, with the inverse operation we could cancel the scaling, and because the scaling part is in the diagonal of the matrix (the coordinate system’s axes actually), so transpose operation didn’t affect anything. And since rotation matrix is an orthogonal matrix its inverse is its transpose, then the transpose operation just inverse again the rotation part, as a conclusion means we first invert it, then transpose it, same as invert it and invert again, which means nothing happened to the rotation part! Let’s write it in formula: $M^{-1T} = S^{-1T} * R^{-1T} * T^{-1T}$, since $S^{-1T} = S^{-1}, R^{-1} = R^{T}$, then $M^{-1T} = S^{-1} * R * T^{-1T}$. In practice we often shrink the 4x4 matrix to 3x3, just use the upper left part, which removes the useless translation components as $M_{3 \times 3}^{-1T} = S^{-1} * R$, and with this matrix, we could achieve our goal.

Since most of the rendering pipelines(as far as I know) are designed to deal with scene transform on the CPU side, and always passes the already multiplied transformation matrix to GPU, then the method 3 is used commonly. But for HLSL it doesn’t provide a native inverse function like GLSL, that means whether you have to build a special hard-coded version for this purpose or inverse it previously on the CPU side. And with respect to this little cons the method 2 looks like another good choice because you don’t need any inverse operation on the GPU, the cost changes to an additional cbuffer data passed to GPU. Currently, I choose to use method 2, unless it hits me with some painful issues I think I won’t change to method 3 in DirectX.

## Messy micro

And then the normal mapping became a little bit annoyed. The technique to store the geometry detail as the offsets in surface/vertex’s tangent space was invented by Blinn in the ’70s, which we called normal mapping commonly now, it is one kind of bump mapping techniques which could give a huge improvement about the surfaces detail without adding more vertices to hurt the performance. Since the common practice is storing normal texture in tangent space, unless we could get the tangent space-to-model’s local/world space transformation, we can’t apply it to our vertex normals. This means we need to construct a space that treats the vertex normals as the positive Z axis, then transform the normal texture data to world space, or transform the other light/position data to tangent space with its inverse matrix. There are still some different approaches:

- Use precomputed tangent vector combining with normal vector to calculate the bitangent vector, and use these 3 as the axes to construct a tangent/TBN space;
- Compute tangent vector on-the-fly by using texture UV coordinates and vertices data. Since the edge direction of the triangle could be calculated by the vertex position, and at the same time it also could be constructed by the TBN space’s T and N axis and the UV coordinates, then we could combine them together to get the T and N, in formula it is $\vec{E} = \vec{p_1} - \vec{p_2}$, $\vec{E} = \Delta \vec{U}T + \Delta \vec{V}B$, then this is a solvable linear algebra problem.

In practical, if we choose method 1, then we have to precompute the tangent vector and store it offline somewhere, then passed to GPU in real-time. And if we choose method 2, we need to calculate it on CPU side then send to GPU, basically, it has no implementation difference with method 1. I chose method 2 and implement it on shader directly, rather than the original idea to compute it on CPU, it would spend less bandwidth and make the vertex data structure tighter. The idea comes from a nice blog post, it utilizes the built-in shader partial derivative functions to calculate the screen space gradient, then does some cross-product to get the final results. But since OpenGL users commonly use RHS and DirectX users commonly use LHS, also because of the different windows/texture coordinate system of them, it confused me a lot at the beginning.

The texture coordinates of OpenGL starts from **bottom-left** corner, and when submitting 2D texture data array (1D in memory) to OpenGL, it would fill the texture buffer from the **bottom-left** corner to top; In DirectX the texture coordinates starts from **top-left** corner, and DirectX texture buffer would be filled from **top-left** corner to bottom. That means if you sample a texture data with **same** coordinates, OpenGL and DirectX would return the **same** results. And this means actually for every loaded textures data, I don’t need to change anything related with UV coordinates. The misinterpretation of this led me to a wrong UV convert parser at the first.

Then the partial derivative functions, this is easier to map from OpenGL to DirectX, the ddx*fine() and ddy*fine() are exactly same as dFdx() and dFdy(). But then the T and B axes construction is a little bit tricky, I choose to use LHS in DirectX, then have to flip the T and B when the other implementation details are as same as OpenGL. Also, the final TBN 3x3 matrix in OpenGL and DirectX would be as the transpose to each other but meanwhile staying unified with each other’s math convention, so I just need to follow the matrix-vector multiplication rules on each side as usual and always.

Here are some code pieces about:

```
// get edge vectors of the pixel triangle
vec3 dp1 = dFdx(thefrag_WorldSpacePos.xyz);
vec3 dp2 = dFdy(thefrag_WorldSpacePos.xyz);
vec2 duv1 = dFdx(thefrag_TexCoord);
vec2 duv2 = dFdy(thefrag_TexCoord);
// solve the linear system
vec3 N = normalize(thefrag_Normal);
vec3 dp2perp = cross(dp2, N);
vec3 dp1perp = cross(N, dp1);
vec3 T = normalize(dp2perp * duv1.x + dp1perp * duv2.x);
vec3 B = normalize(dp2perp * duv1.y + dp1perp * duv2.y);
mat3 TBN = mat3(T, B, N);
vec3 WorldSpaceNormal = normalize(TBN * (texture(uni_normalTexture, thefrag_TexCoord).rgb * 2.0 - 1.0));
```

```
// get edge vectors of the pixel triangle
float3 dp1 = ddx_fine(input.thefrag_WorldSpacePos);
float3 dp2 = ddy_fine(input.thefrag_WorldSpacePos);
float2 duv1 = ddx_fine(input.thefrag_TexCoord);
float2 duv2 = ddy_fine(input.thefrag_TexCoord);
// solve the linear system
float3 N = normalize(input.thefrag_Normal);
float3 dp2perp = cross(dp2, N);
float3 dp1perp = cross(N, dp1);
float3 T = -normalize(dp2perp * duv1.x + dp1perp * duv2.x);
float3 B = -normalize(dp2perp * duv1.y + dp1perp * duv2.y);
float3x3 TBN = float3x3(T, B, N);
float3 normalInWorldSpace = normalize(mul(t2d_normal.Sample(SampleTypeWrap, input.thefrag_TexCoord).rgb * 2.0f - 1.0f, TBN));
```