AMD recently published a patent to spread the rendering load across multiple GPU chiplets. A game scene is divided into individual blocks and distributed to the chiplets in order to optimize the use of shaders in games. A two-stage chiplet binning is used for this purpose.
AMD publishes patent for GPU chiplets application to make better use of shader technology
AMD’s new published patent provides further insight into the company’s plans for next-generation GPU and CPU technology in the years to come. At the end of June, 54 patent applications were approved for publication. It is not known which of the more than fifty published patents will be used in AMD’s plans. The applications discussed in the patents describe in detail the company’s approaches in the following years.
One application noted by community member @ETI1120 on the ComputerBase website, patent number US20220207827, discusses critical image data in two stages to efficiently route the rendering loads from a GPU across many chiplets. CPU filed this with the US Patent Office for the first time late last year.
When image data is rasterized on a GPU by default, the shader unit, also known as the ALU, does the same job and assigns a color name to individual pixels. The textured polygons found at the specific pixel in a particular game scene are in turn mapped directly to the pixel. After all, the formulated task retains atypical principles and differs only in other textures located on other pixels. This method is called SIMD or Single Instruction – Multiple Data.

For most current games, shading isn’t the only task performed by a GPU. Instead, several post-processing elements are inserted after the initial shading. Examples of actions that the GPU would add would be anti-aliasing, shadowing, and occlusions of the game environment. However, ray tracing happens together with shading, which creates a new calculation method.
If you talk about the GPU controlling the graphics in current games, the load generated by the computer increases exponentially to thousands of processing units.
For games on GPUs, this computing load scales somewhat ideally to several thousand computing units. This differs from processors in that applications must be specially written to add more cores. The CPU scheduler creates this action by dividing the work from the GPU into more digestible tasks to be processed by the compute units, also known as binning. The game image is rendered and then divided into separate blocks containing a set number of pixels. The block is calculated by a sub-unit of the graphics processor, where it is then synchronized and created. After this action, pixels waiting to be calculated are included in a block until the graphics card subunit is finally used. The computing power of the shaders, the memory bandwidth and the cache sizes are taken into account.

AMD explains in the patent that sharing and connecting requires a thorough and complete data connection between all elements of the GPU, which is a problem. Links from off-chip data have increased latency, causing the process to be slower.
CPUs have made this transition to chiplets effortlessly as they can send the task across multiple cores, making them chiplet accessible. GPUs don’t offer the same flexibility and place their scheduler comparable to an introductory dual-core processor.

AMD recognizes the need and is trying to find answers to these problems by changing the rasterization pipeline and sending tasks between multiple GPU chiplets, similar to CPUs. This requires an advanced binning technology, which the company is introducing as “two-level binning”, also known as “hybrid binning”.
With hyper-binning, the division is processed into two separate stages, rather than being processed directly into pixel-by-pixel blocks. The first step is to calculate the equation by taking a 3D environment and creating a 2D image from the original. The stage is called vertex shading and is completed before rasterization, and the process is extremely minimal on the GPU’s first chiplet. Once the game scene is complete, it begins to group, develop into coarse bins, and process them into a single GPU chiplet. Then routine tasks such as rasterization and post-processing can begin.
It is not known when AMD intends to begin using this new process, or if it will be approved. However, it does give us a glimpse into the future of more efficient GPU processing.
News Sources: ComputerBase, Free Patents Online