Until GCN3, each CU contained 512 SGPRs, and from GCN3 on the count was bumped to 800. Every CU also has a register file of 32-bit SGPRs (scalar general-purpose registers). AMD GCN compute unit (CU)Ĭonsider the architecture of a GCN compute unit:Ī GCN CU includes four SIMDs, each with a 64 KiB register file of 32-bit VGPRs (Vector General-Purpose Registers), for a total of 65,536 VGPRs per CU. However, in order to maximize occupancy, shaders must minimize register and LDS usage so that resources required by all threads will fit within the CU. Modern AMD GPUs are able to execute two groups of 1024 threads simultaneously on a single compute unit (CU). However, due to memory latency, spilling entails a significant negative performance impact and should be avoided in production code. There is no specified maximum register count, and the compiler can spill registers to memory if necessitated by register pressure. The DirectX® 11 Shader Model 5 compute shader specification (2009) mandates a maximum allowable memory size per thread group of 32 KiB, and a maximum workgroup size of 1024 threads. This article will be focusing on the problem set of large thread groups, but these tips and tricks are helpful in the common case as well. This article discusses potential performance issues, and techniques and optimizations that can dramatically increase performance if correctly applied. Limited register space, memory latency and SIMD occupancy each affect shader performance in different ways. When using a compute shader, it is important to consider the impact of thread group size on performance. Occupancy and Resource Usage Optimization with Large Thread Groups Sebastian is going to cover an interesting problem he faced while working on Claybook: how you can optimize GPU occupancy and resource usage of compute shaders that use large thread groups. Second Order have recently announced their first game, Claybook! Alongside the game looking like really great fun, its renderer is novel, using the GPU in very non-traditional ways in order to achieve its look. Welcome to our guest posting from Sebastian Aaltonen, co-founder of Second Order LTD and previously senior rendering lead at Ubisoft®.