Last month I got interviews for a 3d dev position in a couple of companies near Paris. I mentioned in my resume that I already wrote DX12 and Vulkan code and as expected at some point the interviewers asked me my opinion on DX12 and if I got some performance improvements by using the API. I admitted that my experience is biased since RPCS3 initial GL implementation was not optimal so the performance figure I got were not representative of what can be achieved with DX12.
What surprised me is that my interviewers didn’t show lots of interest in Vulkan. Gamedev studio in Paris can be considered as middle sized ones, between 20 and 70 full time dev who works on several (often niche) titles at the same times for bigger editor. Although they are big enough to maintain their own 3d engine they can’t invest as much money as the big name in the industry. Hence technical choices are often dictated by time and budget constraints.
The first company I was interviewed by (which ships games on PC, PS4 and X1) explained that adding a new rendering backend is costly and that they won’t support Vulkan except if it was the API of the at that time unnamed Nintendo new console (it turns out it’s not the case, the Nintendo Switch is using a custom API, VNM. As far as I know Nintendo always used custom gfx API). With around 1% of market share Linux wasn’t considered as a viable target; neither was Android despite being the first or second most used platform. I suspect the poor state of 3d graphic in the Android ecosystem is to blame here and the web is plenty of horror story when it comes to OpenGL ES programming. I think that a lot of the enthousiasm surrounding Vulkan comes from the promises of more reliable drivers on Android thanks to the thinner layer of abstraction and the externalization of debug checks. SoC vendors also seems more inclined to improve the situation with Samsung heavily promoting Vulkan support though footage of early demo of Unreal Engine 4 and their partnership with editors to bring Vulkan version of NFS No Limits and Vain Glory on Google Play.
The second company’s engine already has an OpenGL codepath enabling Mac support. While OpenGL and Vulkan are very different in design but they at least share the same shader language (GLSL) which ease any potential porting work (which is still a big task, Vulkan’s being a very verbose API). However, Vulkan was never talked about during the interview; the interviewers were rather interested about what DX12 could bring. The catch is that there’s probably as much work to port a DX11 engine to DX12 as to port it to Vulkan, both Vulkan and DX12 relying on the same gpu abstraction; and while DX12 has a one year head start in term of tool and driver support it’s hindered by the OS support. Although Windows 10 has already gained around 25% of OS market share 60% of the market is still running Windows 7 or 8.x and there isn’t free upgrade offer anymore which makes the adoption rate slower.
On the other hand in both interviews it was clear that support for a low overhead API wasn’t a very high priority. Their engine has made the switch on DX11 very recently to reach parity with PS4 and Xbox One and the dev doesn’t want to reenter a new development cycle right now. At least for Windows; the only way to use Compute Shader, the most popular feature of DX11 hardware gen, on OSX is through Metal API. OpenGL on OSX is supported up to version 4.1 since 2012 and there is no sign that it will change soon. Increasing Metal popularity among will reinforce iOS lead in the smartphone gaming market meaning Apple has no incentive to support the low level API used of their main competitor.
I’m assuming the reader has some knowledge in graphic programming (with D3D or GL) and generally knows how modern CPU works.
The specs of RSX
RSX is the graphic processor of the PS3. The acronym stands for Reality Synthetiser according to Wikipedia. It’s actually based from Nvidia own Geforce 7800 Gtx, a directX9 class gpu slightly modified to allow Cell’s SPU to do image processing. It has 256 MB of local memory (PS3 terminology for « video memory ») with a 22.4 GB/s bandwidth according to Wikipedia again. It’s processing power is 228 Gflops/s and supports up to 4 render targets.
The aforementioned customisations of RSX allows it to access the main memory (shared by PPU and SPU) at 20 GB/s in read direction and 15 GB/s in write direction. This means that even if it’s slower to render a scene in main memory instead of local memory, the bandwidth hit is rouhly 30%.
Now let’s compare these numbers with the ones from a 2015 PC architecture :
A Geforce 970 has 4 GB of video memory (x16) with a 200 GB/s bandwidth (x10), can process 5 Teraflop of data (x25) and support 8 render targets. However the theorical peak bandwidth of DD3 is 20 GB/s which is barely what the PS3 did offer in 2007.
The implication for emulation are strong : we can’t afford extra memory transfers between main memory and video memory.
RSX and Cell interaction
In all modern architecture CPU and GPU are executing independently from each others and PS3 is no exception.
RSX commands are 32 bits instructions puts in command buffers in a sequential maneer by Cell. There are 3 special command that are used to break the sequential flow of RSX, namely JUMP (mostly used to move from one command buffer to another) and CALL/RETURN pair (used to implement subroutines). Of course Cell needs to be able to prevent RSX to read commands faster than it can fill command buffer and thus RSX provides a « get » and a « put » register accessible from Cell. « Get » contains the memory address of the command the RSX is currently reading. « Put » can be written to by Cell, and is used as a « barrier », ie the RSX reads command only if Get and Put are different. Put register’s purpose is similar to glFlush in OpenGL.
RSX commands can be sorted in 3 categories :
Commands that set RSX register. Like most ancient GPU the RSX doesn’t fetch non buffer inputs in memory but in hardware registers. This includes textures, buffers, render surface… description (their location, their format, their stride, …), vertex constants (RSX has 512 4×32 registers storage for vertex constants), some pipeline state related to blending operation or depth testing. I didn’t mention fragment constants (or pixel shader constant if you prefer D3D terminology) because it looks like there is no true storage for them, the cell has to « patch » fragment program/pixel shader in memory.
Commands that issues actual rendering operations. So far I only saw 3 of them, a « clear surface » command that clear render targets (the clear value being stored in register), a « draw » command that issues an unindexed rendering call, an an « indexed draw » command that issues an indexed rendering call.
Commands managing semaphores. Semaphore provide a more powerful sync mechanism than the get/put register. Semaphore are basically location in memory associated with a 32 bits values. A wait command can be used to make RSX hold execution until the semaphore values is the same as the expected one, for instance to allow Cell to complete buffer filling task. A release command can be used to make RSX writes a specific value to the semaphore location ; this way a Cell thread can be notified that RSX has finished with a given rendering command and is free to update buffers.