(Spoilers-free post!)
Quite a lot of people I know inside or outside this industry are playing the long-waited “gargantuan maelstrom” now. After a couple few hours immersed around the stunning view (and bugs of course) in the Night City, I was wondering how one frame is rendered in Cyberpunk 2077 (every graphic programmer’s instincts!). So I opened RenderDoc and PIX, luckily REDengine didn’t behave unfriendly like some other triple-that (yes Watch Dogs 2 I mean you), I got a frame capture without any problems. (Disclaimer: I’m pretty sure later others like Alain Galvan, Anton Schreiner, Adrian Courrèges or even guys from CDPR would bring us a more detailed and accurate demonstration about how things works, I’m just chilling out around and welcome for any discussions and corrections! )
I’m running the game in Ultra configs in 1080p without ray-tracing and DLSS enabled, for the sake of my years “old” GTX1070
Two captured final results, one is around the main character’s apartment at day (let’s name it main street) and another one is around some river bank at night (Nobody dislikes eye candy!):
There is one Graphic queue, one Compute queue, and one Copy queue, the GPU work balance is quite even
ID3D12GraphicsCommandList::CopyBufferRegion
is frequently called between passes to copy RT results
Quite a lot D3D12_RESOURCE_FLAGS::ALLOW_UNORDERED_ACCESS
removal recommendation was thrown out by PIX, not sure if it’s really just my capture occasionally didn’t use some resources or, they do need some optimizations (I had experiences that unordered access was quite hurting sometimes)
Frame Timeline: Intensive at some early geometry stages, then deferred works kick in
The entire frame starts by preparing the resources for the in-game billboards and screens
First few copy operations executed on 3 textures: The destination textures’ format is DXGI_FORMAT_R8_UNORM
, the first is 480x272 and another two are at half size, the source texture should be a mega texture which is the destination of lots later copy operations. (Question: what’s the purpose for these copies? In the night scene it doesn’t have these operations, is it related to streaming?)
One of the three very first copied textures:
Billboards are drawn to one texture where all mips tightly packaged together
Billboards:
Then the elements for the cyberish-GUI are drawn and also copied to the mega texture
Part of the GUI elements:
D16_UNORM
2k texture, around 400~ instanced draw calls executed in the captured scene (not sure how this RT later used)There are 2 dispatches over a 64x64x64 R16G16B16A16_FLOAT
3D texture in this pass (looks like voxelized normal/position?). The thread group count at the first time is 2x2x64, the second time it’s 16x16x16 (the RT is used later multiple times in terrain pass, ocean wave pass, depth-pre pass, and base material pass, definitely something crucial for geometry information)
Some slices of the 3D Normal:
ByteAddressBuffer
and StructuredBuffer
are bound as UAV (all of them are used later in the base material pass)1 R8G8B8A8_UNORM
2 slices 2D texture array and 1 R16_UINT
2D texture are drawn in 1k resolution (looks like they are chunks of terrain)
2 draw calls and RTs:
R16_UINT
2D textureThe top-view ocean map mask and a generated wave noise texture are bound as input, RT format also changes to R16_UNORM
, no tessellation shader stage, some regular grid plate meshes are used to directly draw different chunks
Ocean wave noise in R16G16B16A16_FLOAT
and mask in BC7_SRGB
:
:
The RT is in D32_S8_TYPELESS
format, surely it would help to optimize later pixel-heavy works
Depth-pre pass result:
// VB0
nshort4 POSITION0;
// VB1
half2 TEXCOORD0;
// VB2
xint NORMAL0;
xint TANGENT0;
// VB3
unbyte4 COLOR0;
half2 TEXCOORD1;
// VB7
float4 INSTANCE_TRANSFORM0;
float4 INSTANCE_TRANSFORM1;
float4 INSTANCE_TRANSFORM2;
In pixel shader sometimes an R16G16B16A16_UNORM
64x64 noise texture is bound as input, only RG channels contain data, similar to what we’d use in SSAO for random normal rotation or offset
All material textures are accessed through the descriptor table
RT0 in R10G10B10A2_UNORM
is the albedo, alpha channel is used as object mask, animated meshes are marked as 0x03
RT1 in R10G10B10A2_UNORM
is world-space normal packaged by offset
RT2 in R8G8B8A8_UNORM
is metallic-roughness and other attributes, animated meshes are not rendered here, the alpha channel should be the transparent or emissive property for emissive material objects
RT3 in R16G16_FLOAT
is Screen-space motion vector:
Depth-stencil is D32_FLOAT_S8X24_UINT
format, stencil bit 0x15
is used to mark character body meshes, 0x35
for face, 0x95
for hair and 0xA0
is for trees, brushes, and other foliage
Drawing order:
The D32S8_TYPELESS
Depth-stencil buffer is converted to R32_TYPELESS
in full-screen size and then downsampled to half size as R16_FLOAT
and R8_UINT
in 3 passes, for easier pixel sampling later.
All moving objects and foliage and furs are marked with some different bits in this pass regarding with last frame screen-space positions, and then further extended for few pixels (Exactly the same solution as Temporal Antialiasing in Uncharted 4: Better anti-ghost!)
Motion-stencil pass:
R8_UNORM
RT and used later in SSR and TAA passes
R32_FLOAT
:
R16_TYPELESS
texture is used first to render a few simple meshes (some kinds of mesh proxy?), but the RT is never usedR16_TYPELESS
2D texture array, no geometry shader RT index is involved, it just executes draw calls 4 times on each cascade
R32_TYPELESS
in 1k, then converted to 10 slices R16G16UNORM
(should be VSM?)R11G11B10_FLOAT
format:
And a stereographic projected sky radiance is also generated:
64x64x64 texture that is R32G32_UINT
format, 2 channels contain index-like data
Slice 19 for example
R11B11G10_FLOAT
with another R8_UNORM
RT for reflectivity. The previously noised world-space normal and motion stencil RT is used as inputs, also the last frame’s downscaled TAA output is used for sampling:
R16_FLOAT
, a 256x64 R16G16B16A16_FLOAT
and a 256x1 R16G16B16A16_FLOAT
images are generated by 3 compute dispatches, and finally are used as part of the inputs for the next compute pass to generate a structured buffer, which is used widely when rendering sky cubemaps and reflection probes
Well, what else I could say, 2 decades have passed already in the 21st century, Cyberpunk 2077 basically is the summing of the modern game rendering techniques which progressed during the years. Even without the enabling of ray-tracing, the overall appearance of the graphics are magnificent, it just bursts my euphoria when I’m pacing around Night City. Despite all the controversies over the release and gameplay glitches which mainly caused by not robust enough physics system and overestimated streaming implementation, the game’s rendering is quite well-crafted, but still, I expected to see more state-of-the-art architectural practices such as a heavier GPU-driven rendering pipeline (which’s been more and more popular since 2016-2017). How’s your opinion about it? And what could change if we come back 5 years later to have a review? Maybe we’d laugh at the “outdated” techniques in Cyberpunk 2077, but I’m sure we’ll still keep admitting that real-time rendering is always full of exciting directions to explore!