A (not) tedium work
Recently I started to port my project to DirectX 11, and it has lot of interesting differences with OpenGL such like the coordinates and matrix convention, stronger type safety requirement, better shader resources management (I didn’t try OpenGL’s SSBO yet but constant buffer is really easier to wrap into an elegant layer) and etc. One thing I stuck a little is the normal mapping there since I used the onthefly tangent generation in glsl, it quite confuses me at first when I rewrote it in hlsl.
Always Review
Basically the normal vector is interpreted as one unit direction vector who is perpendicular with a surface (or mathematically speaking in general, normal vector $\vec{n}$ is one gradient vector of the gradient $\nabla f = \left\langle {{f_x},{f_y},{f_z}} \right\rangle$ of a scalar field $f\left( {x,y,z} \right) = k$ in a vector space at a certain point $\left( {{x_0},{y_0},{z_0}} \right)$, who is orthogonal/normal always with the field), and with this surface normal we could achieve some old style flat shading which only give some discrete results of surface color. Later Gouraud (maybe not him) invented the vertex normal, as the average of the surface normal vectors which the vertex is in, then with some interpolation in the pixel processing stage, it could give a smoother shading result.
Typically we would get the precomputed normal vector of the models from those DCC tools such like Blender 3DS Max or Maya (or generated with some crossproduct on CPU side by your own), and they are stored in the local space of the model. Then if we want to render anything with its help in our world space (strictly speaking finally in screen space), we need to transform these normal vectors.
The first idea is just to use the model’s local to world transformation matrix to transform the normal vector since it represents a direction, its w component is 0 in homogeneous 4D space then the translation part (the last column of transformation matrix) won’t have any effect on it. And the rotation part is what we want to apply, but if we scaled an ununified extension to the model, then the normal vector would be sheared too, the direction changes! So we need to figure out how to cancel the unwanted scaling only on normal vectors, multiple solutions here:
 Multiply with the inverse of the scale matrix $N_{ws} = S^{1} * M * N_{ls}$;
 Without multiply the transformation matrix at the first, instead, just multiply a rotation matrix with local space normals $N_{ws} = R * N_{ls}$;
 A classic trick called “invert transpose normal matrix”, which means use the transpose of the inverse of the model’s local to world transformation matrix to multiply with the local space normals $N_{ws} = M^{1T} * N_{ls}$. Since it looks less intuitive, you may ask why this works, well, with the inverse operation we could cancel the scaling, and because the scaling part is in the diagonal (the coordinates system’s axes actually), so transpose operation didn’t affect anything to it. And since rotation matrix is an orthogonal matrix its inverse is its transpose, then the transpose operation just inverse again the rotation part, as a conclusion means we first invert it, then transpose it, same as invert it and invert again, which means nothing happened to the rotation part! Let’s write it in formula: $M^{1T} = S^{1T} * R^{1T} * T^{1T}$, since $S^{1T} = S^{1}, R^{1} = R^{T}$, then $M^{1T} = S^{1} * R * T^{1T}$. In practice we often shrink the 4x4 matrix to 3x3, just use the upper left part, which removes the useless translation components as $M_{3 \times 3}^{1T} = S^{1} * R$, and with this matrix, we could achieve our goal.
Since most of the rendering pipelines(as far as I know) are designed to deal with scene transform on CPU side, and always passes the already multiplied transformation matrix to GPU, then the method 3 is used commonly. but for hlsl it doesn’t provide a native inverse function like glsl, that means whether you have to build a special hardcoded one for this purpose or inverse it previously on CPU side. And with respect to this little cons the method 2 looks like another good choice because you don’t need any inverse operation, the cost changes to an additional cbuffer data passed to GPU. Currently, I choose to use method 2, unless it hits me with some painful issues I think I won’t change to method 3 in DX.
Messy micro
And then the normal mapping became a little bit annoyed. The technique to store the geometry detail as the offsets in surface/vertex’s tangent space was invented by Blinn in 70’s, which we called normal mapping commonly now, it is one kind of bump mapping techniques which could give a huge improvement in surfaces detail but without adding more vertices to hurt the performance. Since the common practice is storing normal texture in tangent space, unless we could get the tangent space to model’s local/world space transformation, we can’t apply it to our vertex normals. This means we need to construct a space which treats the vertex normals as the positive Z axis, then transform the normal texture data to world space, or transform the other light/position data to tangent space with its inverse matrix. There are still some different approaches:

Use precomputed tangent vector combining with normal vector to calculate the bitangent vector, and use these 3 as the axes to construct a tangent/TBN space;

Compute tangent vector onthefly by using texture UV coordinates and vertices data. Since the edge direction of the triangle could be calculated by the vertex position, and at the same time it also could be constructed by the TBN space’s T and N axis and the UV coordinates, then we could combine them together to get the T and N, in formula it is $\vec{E} = \vec{p_1}  \vec{p_2}$, $\vec{E} = \Delta \vec{U}T + \Delta \vec{V}B$, then this is a solvable linear algebra problem.
In practical, if we choose method 1, then we have to precompute the tangent vector and store it offline somewhere, then passed to GPU in realtime. And if we choose method 2, we need to calculate it on CPU side then send to GPU, basically, it has no implementation difference with method 1. I chose method 2 and implement it on shader directly, rather than the original idea to compute it on CPU, it would spend less bandwidth and make the vertex data structure tighter. The idea comes from a nice blog post, it utilizes the builtin shader partial derivative functions to calculate the screen space gradient, then does some crossproduct to get the final results. But since OpenGL commonly use RHS and DirectX commonly use LHS, also because of the different windows/texture coordinates of them, it confused me a lot.
The texture coordinates of OpenGL starts from bottomleft corner, and when submitting 2D texture data array (1D in memory) to OpenGL, it would fill the texture buffer from the bottomleft corner to top; In DirectX the texture coordinates starts from topleft corner, and DirectX texture buffer would be filled from topleft corner to bottom. That means if you sample a texture data with same coordinates, OpenGL and DirectX would return the same results. And this means actually for every loaded textures data, I don’t need to change anything related with UV coordinates. The misinterpretation of this led me to a wrong UV convert parser at the first.
Then the partial derivative functions, this is easier to map from GL to DX, the ddxfine() and ddyfine() are exactly same as dFdx() and dFdy(). But then the T and B axes construction is a little bit tricky, I choose to use LHS in DX, then have to flip the T and B when the other implementation details are same as GL. Also, the final TBN 3x3 matrix in GL and DX would be as the transpose to each other themselves but meanwhile unified with each other’s math convention, so I just need to follow the matrixvector multiplication rules on each side as usual and always.
Here are some code pieces about:
// get edge vectors of the pixel triangle
vec3 dp1 = dFdx(thefrag_WorldSpacePos.xyz);
vec3 dp2 = dFdy(thefrag_WorldSpacePos.xyz);
vec2 duv1 = dFdx(thefrag_TexCoord);
vec2 duv2 = dFdy(thefrag_TexCoord);
// solve the linear system
vec3 N = normalize(thefrag_Normal);
vec3 dp2perp = cross(dp2, N);
vec3 dp1perp = cross(N, dp1);
vec3 T = normalize(dp2perp * duv1.x + dp1perp * duv2.x);
vec3 B = normalize(dp2perp * duv1.y + dp1perp * duv2.y);
mat3 TBN = mat3(T, B, N);
vec3 WorldSpaceNormal = normalize(TBN * (texture(uni_normalTexture, thefrag_TexCoord).rgb * 2.0  1.0));
// get edge vectors of the pixel triangle
float3 dp1 = ddx_fine(input.thefrag_WorldSpacePos);
float3 dp2 = ddy_fine(input.thefrag_WorldSpacePos);
float2 duv1 = ddx_fine(input.thefrag_TexCoord);
float2 duv2 = ddy_fine(input.thefrag_TexCoord);
// solve the linear system
float3 N = normalize(input.thefrag_Normal);
float3 dp2perp = cross(dp2, N);
float3 dp1perp = cross(N, dp1);
float3 T = normalize(dp2perp * duv1.x + dp1perp * duv2.x);
float3 B = normalize(dp2perp * duv1.y + dp1perp * duv2.y);
float3x3 TBN = float3x3(T, B, N);
float3 normalInWorldSpace = normalize(mul(t2d_normal.Sample(SampleTypeWrap, input.thefrag_TexCoord).rgb * 2.0f  1.0f, TBN));