I’m assuming the reader has some knowledge in graphic programming (with D3D or GL) and generally knows how modern CPU works.
The specs of RSX
RSX is the graphic processor of the PS3. The acronym stands for Reality Synthetiser according to Wikipedia. It’s actually based from Nvidia own Geforce 7800 Gtx, a directX9 class gpu slightly modified to allow Cell’s SPU to do image processing. It has 256 MB of local memory (PS3 terminology for « video memory ») with a 22.4 GB/s bandwidth according to Wikipedia again. It’s processing power is 228 Gflops/s and supports up to 4 render targets.
The aforementioned customisations of RSX allows it to access the main memory (shared by PPU and SPU) at 20 GB/s in read direction and 15 GB/s in write direction. This means that even if it’s slower to render a scene in main memory instead of local memory, the bandwidth hit is rouhly 30%.
Now let’s compare these numbers with the ones from a 2015 PC architecture :
A Geforce 970 has 4 GB of video memory (x16) with a 200 GB/s bandwidth (x10), can process 5 Teraflop of data (x25) and support 8 render targets. However the theorical peak bandwidth of DD3 is 20 GB/s which is barely what the PS3 did offer in 2007.
The implication for emulation are strong : we can’t afford extra memory transfers between main memory and video memory.
RSX and Cell interaction
In all modern architecture CPU and GPU are executing independently from each others and PS3 is no exception.
RSX commands are 32 bits instructions puts in command buffers in a sequential maneer by Cell. There are 3 special command that are used to break the sequential flow of RSX, namely JUMP (mostly used to move from one command buffer to another) and CALL/RETURN pair (used to implement subroutines). Of course Cell needs to be able to prevent RSX to read commands faster than it can fill command buffer and thus RSX provides a « get » and a « put » register accessible from Cell. « Get » contains the memory address of the command the RSX is currently reading. « Put » can be written to by Cell, and is used as a « barrier », ie the RSX reads command only if Get and Put are different. Put register’s purpose is similar to glFlush in OpenGL.
RSX commands can be sorted in 3 categories :
- Commands that set RSX register. Like most ancient GPU the RSX doesn’t fetch non buffer inputs in memory but in hardware registers. This includes textures, buffers, render surface… description (their location, their format, their stride, …), vertex constants (RSX has 512 4×32 registers storage for vertex constants), some pipeline state related to blending operation or depth testing. I didn’t mention fragment constants (or pixel shader constant if you prefer D3D terminology) because it looks like there is no true storage for them, the cell has to « patch » fragment program/pixel shader in memory.
- Commands that issues actual rendering operations. So far I only saw 3 of them, a « clear surface » command that clear render targets (the clear value being stored in register), a « draw » command that issues an unindexed rendering call, an an « indexed draw » command that issues an indexed rendering call.
- Commands managing semaphores. Semaphore provide a more powerful sync mechanism than the get/put register. Semaphore are basically location in memory associated with a 32 bits values. A wait command can be used to make RSX hold execution until the semaphore values is the same as the expected one, for instance to allow Cell to complete buffer filling task. A release command can be used to make RSX writes a specific value to the semaphore location ; this way a Cell thread can be notified that RSX has finished with a given rendering command and is free to update buffers.