All Articles

Rendering analysis - Cyberpunk 2077

(Spoilers-free post!)

Quite a lot of people I know inside or outside this industry are playing the long-waited “gargantuan maelstrom” now. After a couple few hours immersed around the stunning view (and bugs of course) in the Night City, I was wondering how one frame is rendered in Cyberpunk 2077 (every graphic programmer’s instincts!). So I opened RenderDoc and PIX, luckily REDengine didn’t behave unfriendly like some other triple-that (yes Watch Dogs 2 I mean you), I got a frame capture without any problems. (Disclaimer: I’m pretty sure later others like Alain Galvan, Anton Schreiner, Adrian Courrèges or even guys from CDPR would bring us a more detailed and accurate demonstration about how things works, I’m just chilling out around and welcome for any discussions and corrections! )

Overview

  • I’m running the game in Ultra configs in 1080p without ray-tracing and DLSS enabled, for the sake of my years “old” GTX1070

  • Two captured final results, one is around the main character’s apartment at day (let’s name it main street) and another one is around some river bank at night (Nobody dislikes eye candy!):

99_FinalResult_MainStreet 99_FinalResult_RiverBank

  • Some pictures’ color is manually clamped to easier visualize
  • It’s a DirectX 12-only game on Windows (if they started the development around 2013 or slightly earlier, I’m curious that how much effort it costs to upgrade the engine)
  • There are 2 Descriptor Heaps allocated just, one for half million CBV-SRV-UAVs and another for 2k~ Samplers

00_DescriptorHeap_CSU

00_DescriptorHeap_Sampler

  • There is one Graphic queue, one Compute queue, and one Copy queue, the GPU work balance is quite even

  • ID3D12GraphicsCommandList::CopyBufferRegion is frequently called between passes to copy RT results

  • Quite a lot D3D12_RESOURCE_FLAGS::ALLOW_UNORDERED_ACCESS removal recommendation was thrown out by PIX, not sure if it’s really just my capture occasionally didn’t use some resources or, they do need some optimizations (I had experiences that unordered access was quite hurting sometimes)

  • Frame Timeline: Intensive at some early geometry stages, then deferred works kick in

00_FrameTimeline

Pre-geometry passes

Billboard and GUI

  • The entire frame starts by preparing the resources for the in-game billboards and screens

  • First few copy operations executed on 3 textures: The destination textures’ format is DXGI_FORMAT_R8_UNORM, the first is 480x272 and another two are at half size, the source texture should be a mega texture which is the destination of lots later copy operations. (Question: what’s the purpose for these copies? In the night scene it doesn’t have these operations, is it related to streaming?)

  • One of the three very first copied textures:

01_Degree_0

  • Next pass they are used as the shader input:

01_Full

  • Billboards are drawn to one texture where all mips tightly packaged together

  • Billboards:

02_Billboard

  • Then the elements for the cyberish-GUI are drawn and also copied to the mega texture

  • Part of the GUI elements:

02_GUI

Sky visibility/top-view shadow/mini map

  • The RT of the next pass is a single D16_UNORM 2k texture, around 400~ instanced draw calls executed in the captured scene (not sure how this RT later used)

Shadowmap from top: 03_Shadowmap_Top

Unknown compute pass 01

  • There are 2 dispatches over a 64x64x64 R16G16B16A16_FLOAT 3D texture in this pass (looks like voxelized normal/position?). The thread group count at the first time is 2x2x64, the second time it’s 16x16x16 (the RT is used later multiple times in terrain pass, ocean wave pass, depth-pre pass, and base material pass, definitely something crucial for geometry information)

  • Some slices of the 3D Normal:

Unknown compute pass 02

  • 3 dispatches made up by 63x1x1, 3x1x1 then again 63x1x1 thread groups (in the night scene capture it’s 38x1x1, 3x1x1 and 38x1x1), couple-few ByteAddressBuffer and StructuredBuffer are bound as UAV (all of them are used later in the base material pass)

Terrain

  • 1 R8G8B8A8_UNORM 2 slices 2D texture array and 1 R16_UINT 2D texture are drawn in 1k resolution (looks like they are chunks of terrain)

  • 2 draw calls and RTs:

Unknown color pass 01

  • 2 draw calls issued (only at the main street capture) but nothing is drawn due to the clip (and the meshes are barely comprehensible, looks like they are 2 levels of a LOD-ed mesh), RT is 1 256x256 R16_UINT 2D texture

Ocean wave

  • The top-view ocean map mask and a generated wave noise texture are bound as input, RT format also changes to R16_UNORM, no tessellation shader stage, some regular grid plate meshes are used to directly draw different chunks

  • Ocean wave noise in R16G16B16A16_FLOAT and mask in BC7_SRGB:

:

Geometry passes

Depth-pre pass

  • The RT is in D32_S8_TYPELESS format, surely it would help to optimize later pixel-heavy works

  • Depth-pre pass result:

07_Depth_Pre

Base material passes

  • All vertex buffers uploaded to GPU memory are packed as SOA:
// VB0
nshort4 POSITION0;
// VB1
half2 TEXCOORD0;
// VB2
xint NORMAL0;
xint TANGENT0;
// VB3
unbyte4 COLOR0;
half2 TEXCOORD1;
// VB7
float4 INSTANCE_TRANSFORM0;
float4 INSTANCE_TRANSFORM1;
float4 INSTANCE_TRANSFORM2;
  • In pixel shader sometimes an R16G16B16A16_UNORM 64x64 noise texture is bound as input, only RG channels contain data, similar to what we’d use in SSAO for random normal rotation or offset

  • All material textures are accessed through the descriptor table

  • RT0 in R10G10B10A2_UNORM is the albedo, alpha channel is used as object mask, animated meshes are marked as 0x03

  • RT1 in R10G10B10A2_UNORM is world-space normal packaged by offset 0.5x+0.50.5x + 0.5

  • RT2 in R8G8B8A8_UNORM is metallic-roughness and other attributes, animated meshes are not rendered here, the alpha channel should be the transparent or emissive property for emissive material objects

  • RT3 in R16G16_FLOAT is Screen-space motion vector:

  • Depth-stencil is D32_FLOAT_S8X24_UINT format, stencil bit 0x15 is used to mark character body meshes, 0x35 for face, 0x95 for hair and 0xA0 is for trees, brushes, and other foliage

  • Drawing order:

    1. Static Mesh
    2. Ground and emissive objects
    3. Animated objects and destructible (?)
    4. Foliage
    5. Decals
    6. Mask for fur
    7. Fur

DS convert passes

The D32S8_TYPELESS Depth-stencil buffer is converted to R32_TYPELESS in full-screen size and then downsampled to half size as R16_FLOAT and R8_UINT in 3 passes, for easier pixel sampling later.

Motion-stencil pass

  • All moving objects and foliage and furs are marked with some different bits in this pass regarding with last frame screen-space positions, and then further extended for few pixels (Exactly the same solution as Temporal Antialiasing in Uncharted 4: Better anti-ghost!)

  • Motion-stencil pass:

10_MotionStencil

Mask and LUT passes

Reflection mask

  • Parts of some objects are drawn onto an R8_UNORM RT and used later in SSR and TAA passes

11_Reflection

Compute pass to noise normal

  • The RT is used for sun shadow, AO(?), direct sunlight, SSR and indirect skylight passes

12_Noised_Normal

  • Noise sample:

12_Noised_Normal_Noise_Sample

Some passes to mask the sky out

  • All of them are in quite low resolution and mipmaped, the mipmap calculation executes multiple times

  • And finally, there is a graphic pass to draw a quite accurate sky mask on R32_FLOAT:

13_SkyMaskAccurate

(HB)AO (?)

  • All in 960x540, looks like some sort of AO-required normal data:

Color-grading LUT

  • Color-grading LUT:

Ocean wave noise

  • Around 22~ 32x32x1 compute works are dispatched to generate noise for water, which is used in previous ocean wave pass

Light and shadow passes

CSM passes for direct light

  • A 2k R16_TYPELESS texture is used first to render a few simple meshes (some kinds of mesh proxy?), but the RT is never used
  • Then 4 classic CSM cascades are rendered and stored as 4 slices in a 4k R16_TYPELESS 2D texture array, no geometry shader RT index is involved, it just executes draw calls 4 times on each cascade

Omni shadow maps for point/area lights

  • Every omni shadow map is first rendered as R32_TYPELESS in 1k, then converted to 10 slices R16G16UNORM (should be VSM?)

17_OmniShadowmap

Cloud distribution

18_Cloud

Sky cubemaps

  • 7 mipmap levels in R11G11B10_FLOAT format:

And a stereographic projected sky radiance is also generated: 18_SkyProjected

Coat mask (?)

Clustered light index (?)

  • Some compute-only passes generate a few 3D textures that contain only indices:

Direct light shadow

  • In full screen resolution, it’s coming from sun at the day and moon at the night

21_SunShadow

Local lights mask (?)

  • In a 120x68 60x34 resolution

22_LocalLightMask

Unconfirmed compute dispatches to update some StructuredBuffers

Environment Radiance capture

  • First, cubemap version is generated

  • Then all of them are converted to 2D dual-paraboloid textures in 512x256, each one contains 6 mips, and there are 32 captures stored as slices in one 2D texture array, 6~ texture arrays in the main street current scene

  • 64x64x64 texture that is R32G32_UINT format, 2 channels contain index-like data

  • Slice 19 for example

24_GIProbeIndices_Slice_19

Landscape maps

  • A few layers drawn here, from normal to a kind of heatmap in 1k resolution

25_LandscapeMap_Light 25_LandscapeMap_Heat

Local volumetric fog

  • 2 mips from 240x136 to 120x68, each one has 128 slices and mip 0 is from main camera view, it should be some kind of ray-marching:

Head mask

  • The most hilarious one, should be used later to render detailed facial expressions

27_HeadMask

Direct and emissive lights

  • Tiled-lighting, each tile is 16x16 by size and all executed as indirect compute command:

Sky

  • Only masked sky region is drawn

SSR

  • A low resolution depth mask is drawn first:

30_SSRTile

  • Then the full-screen size SSR is drawn in R11B11G10_FLOAT with another R8_UNORM RT for reflectivity. The previously noised world-space normal and motion stencil RT is used as inputs, also the last frame’s downscaled TAA output is used for sampling:

Indirect light

  • Could see some artifacts (Light leaking because of VCT?):

32_IndirectLight

All lights composition pass

  • 2 RTs, specular reflections are stored on RT1:

33_LightCompositionPass_RT0 33_LightCompositionPass_RT1

Transparent LUT

  • Compute shader is involved a few times to generate some LUTs for later transparent and holographic objects:

34_TransparentLUT_1 34_TransparentLUT_2

Skin and eyes

  • Skin irradiance is rendered in screen space:

35_SkinIrradiance

  • Then eyes are rendered:

AO(?)/GI shadow

  • Add additional shadows to pixels that should not be lighted too much:

Volumetric fog

  • Just a simple image composition pass:

Water reflection

  • Few passes generate the water reflection, including all related masks and noises:

Post-processing passes

TAA

  • Next it’s a full-screen no-suprise TAA pass:

Blur

  • First, the TAA-ed result is downscaled to 1/2, 1/4, and 1/8 size in 1 compute pass and stored in 3 mips, then each of the mips is blurred horizontally and vertically

Unknown transparent

Cloud

  • The distribution noise of the cloud is pre-generated:

Holo and transparent objects

  • All futurism holographic objects are drawn next (a significant number of small triangle count meshes), including water surface and some local light scattering:

Post-TAA

  • 2 downscaled images are generated again in 1/2 and 1/4 sizes with all transparent objects on them, and finally, it’s converted back to full-screen size and gets sharpened

HDR LUT(?)

  • A 256x256 R16_FLOAT, a 256x64 R16G16B16A16_FLOAT and a 256x1 R16G16B16A16_FLOAT images are generated by 3 compute dispatches, and finally are used as part of the inputs for the next compute pass to generate a structured buffer, which is used widely when rendering sky cubemaps and reflection probes

45_HDRLUT_01 45_HDRLUT_02

Bloom

  • The full-screen images are half-sized 6 times and get bloomed

Camera lens effects

  • Rendered on a half-screen size RT

47_LensFlare

Color grading

  • The color-graded image is then converted to LDR

48_ColorGrading

Gamma correction

  • The Gamma correction is executed in a weird resolution 456x256

49_GammaCorrection

GUI elements

  • All GUI elements are drawn on a full HD RT and the corresponding mipmaps are generated, and then a few compute passes add bloom to them

50_GUI

Film Grain

  • The noise LUT is generated on-the-fly

51_FilmGrain

Swap chain image

  • Finally, the result of the all above is presented to the screen

99_FinalResult_MainStreet 99_FinalResult_RiverBank

Undrawnable conclusions

Well, what else I could say, 2 decades have passed already in the 21st century, Cyberpunk 2077 basically is the summing of the modern game rendering techniques which progressed during the years. Even without the enabling of ray-tracing, the overall appearance of the graphics are magnificent, it just bursts my euphoria when I’m pacing around Night City. Despite all the controversies over the release and gameplay glitches which mainly caused by not robust enough physics system and overestimated streaming implementation, the game’s rendering is quite well-crafted, but still, I expected to see more state-of-the-art architectural practices such as a heavier GPU-driven rendering pipeline (which’s been more and more popular since 2016-2017). How’s your opinion about it? And what could change if we come back 5 years later to have a review? Maybe we’d laugh at the “outdated” techniques in Cyberpunk 2077, but I’m sure we’ll still keep admitting that real-time rendering is always full of exciting directions to explore!

Published Dec 12, 2020

Random randomness in randomized randomization.