Performance analysis of mesh rendering

XanHou · Jul 12, 2013

I have done some performance testing on the rendering of opaque meshes.
This rendering is causes 1/2 of the lag when using the railgun. The other half is physics calculations.
Since physics are hard to optimize I took a look at the rendering.

The code I looked at is mainly located in:
org.terasology.rendering.logic.MeshRenderer

Especially the render() method.

My conclusions:
1. Batch rendering as proposed by the commented code is slower than per mesh rendering as it is now.
The biggest performance hit is the amount of data that needs to be put in a buffer. Currently for each mesh, no matter the size, 100 bytes are send to the GPU by setting two uniforms (constants for a batch of vertices).
A simple cube consists of 6 size hence 12 triangles hence 36 vertices, each using a total of 15 values for the location, normal, textures coordinates and color, each value is 4 bytes. Hence for each simple cube 2160 bytes need to be send to the GPU. More complex models need even more bytes and we would need more bytes to fix the lighting aswell! The only advantage is that these bytes can be processed without altering the state of the GPU, for which the GPU needs to finish all earlier processing.
There might be better solutions that will render the meshes in batches but I started looking at the per mesh rendering instead.

2. Matrix calculations are slow. For each mesh several matrices need to be calculated and it is checked if the object is visible or not. These calculations do not alter any existing data and could be multithreaded for a small but relevant and steady performance gain.

3. The light values are often the same for a lot of blocks. It is a major optimization to only switch the uniforms if the value actually changed. I can make a pull request for this once finished analyzing (writing this while doing so), but I used a default java HashMap over the trove version, but I do not expect major performance issues for less than 20 elements in a HashMap. (Which is the case).

4. Setting the matrix uniforms takes a lot of time. It might be a major optimization to set the default modelview and normal matrices once and allow for additional location and rotation vectors. This reduces the amount of bytes to send per mesh from (16+9)*4 == 100 to 3*2*4 = 24 bytes.
In the shaders the actual matrices need to be calculated, but that is a piece of cake.
Since this requires changes in the shaders I think it is best if someone who knows better how the shaders are managed in the project handles this. I know how to write the shaders, but I'm not sure if the shaders are also used elsewhere (? unit tests / dev tests ?), which would cause trouble.

~~An other optimization could be to have a separate mesh draw distance.~~
~~Meshes that are far away from the player would not even be considered for drawing.~~
An even better option would be to make this distance dynamic and let the player only choose a minimum distance. The renderer will then keep drawing for a specified amount of time, say 30ms, after which it will skip the remaining meshes. It will draw the meshes closed to the player first and will ignore the time limit if there are still meshes within the minimum draw distance.
~~This is especially usefull for multiplayer, where one player could cause hundreds of meshes to be spawned, causing lag spikes for everyone else.~~
EDIT: I just realized that this cannot be done the way it is decribed above. Meshes are drawn per material for optimization. While it is still possible to set a maximum draw distance and it is possible to only draw the nearest X meshes, it is not possible to keep drawing for Y ms.
In short, I suggest that we add an option to the video settings that limits the amount of meshes drawn to the screen to a constant number. The nearest meshes will then be drawn, making the impact on the look and feel less significant.

ps. For those who want to create havoc with there railgun in the meanwhile, you can put a hard timelimit on the rendering of meshes. I used System.nanoTime() and check if a 30ms limit has been reached every iteration of the loop in the render() method. This is to much hax to put in the final game though. Note that there is a major differance between this and a mesh draw distance. In my hax solution the meshes that were spawned first are drawn, in a proper implementation the nearest meshes would be drawn.

Cervator · Jul 14, 2013

Paging begla and/or Immortius for the possibility of more feedback than a /like

(Too wizardly for me!)

Performance analysis of mesh rendering

XanHou

New Member

Cervator

Org Co-Founder & Project Lead