Optimization!

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
Let's start a new thread discussing some possible optimization. Immortius already contributed a nice profiler and some tweaks. So it's my turn... :geek:

I've finally fixed the stuttering issues on my machine when using the largest viewing distance and playing for a short while. The cause is quite obvious though. That many VBOs, which are used to store the chunk vertices on the GPU, actually use a significant amount of memory. I've profiled my GPU's VRAM while playing and used our nice new profiler to track down peaks. This led me to the cause: the VBO drawing calls. Many VBOs have been moved to the system memory at this point. So accessing and rendering some of the cached chunks became REALLY slow.

As a solution I've added a mechanism that limits the amount of concurrent VBOs stored on the GPU. If the limit is exceeded, those chunks get marked as dirty and the corresponding VBOs are deleted. I've set the limit to 512 VBOs which is enough to display all the chunks (and some more) in the view frustum on the largest viewing distance. If the player starts to look around, some VBOs have to be recreated again using the voxel data.

Currently running a demo flight for almost an hour now and my VRAM has not exceed 600 MB. Before it rose to 1024 MB and started to fill up my system memory. :)
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Nice! There are always more optimizations hiding out there somehow :)
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
One thing I just discovered:

I was messing with the entity system experiment branch, and noticed I was getting a lot of performance spikes resulting in stutter. Tracked it down to calls to getBlockPosition() when chunks weren't already loaded - the chunks would be immediately created or loaded from disk. I was able to relieve the stutter by simply not processing character movements for characters not on a loaded chunk, but in general think this needs to be managed a little more carefully - specifically as we move to multiplayer where an unaccessable chunk cannot be generated or loaded on the client end.
 

Kai Kratz

Witch Doctor
Contributor
Architecture
Logistics
Since I'm currently working on chunk caching ill add my 2 cents.

First of all i did pony up my performance measurement and as it stands loading chunks from disk is one order of magnitude faster than decompressing. So well pony that? No not really, i don't like 2GB save games clogging up my disk and i guess neither do you. So next think ill do is to investigate into the direction of using compressed storage for saved worlds / and memorymapped files as near chunk cache and compressed storage as a far chunk cache. What does bother me is that in memory compression as chunk storage did speed up chunk display at my home machine.

Here is a quick benchmark(read/write 500 pregenerated chunks) result done on my home machine:
Code:
class org.terasology.logic.world.ChunkCacheGZip in memory cache size is 297677 bytes
class org.terasology.logic.world.ChunkCacheGZip write time per chunk is 5.818000 ms
class org.terasology.logic.world.ChunkCacheGZip read time per chunk is 4.348000 ms

class org.terasology.logic.world.ChunkCacheUncompressed in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheUncompressed write time per chunk is 0.000000 ms
class org.terasology.logic.world.ChunkCacheUncompressed read time per chunk is 0.002000 ms

class org.terasology.logic.world.ChunkCacheDeflate in memory cache size is 291677 bytes
class org.terasology.logic.world.ChunkCacheDeflate write time per chunk is 5.542000 ms
class org.terasology.logic.world.ChunkCacheDeflate read time per chunk is 3.932000 ms

class org.terasology.logic.world.ChunkCacheFileSystem in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheFileSystem write time per chunk is 9.476000 ms
class org.terasology.logic.world.ChunkCacheFileSystem read time per chunk is 0.452000 ms
Sorry guys for judging to fast on this one, I should be more patient and throughout in my testing / benchmarking.

However this lead me to poke around some more (since the slow reloading of geometry still annoys me so much) and I noticed that some of the computations seem to cost a lot of time, according to jvisualvm we are spending quite dome time in light and vertex position calculation. Now I'm wondering if we shouldn't compute this just once. Any feedback would be very welcome. Still trying to figure out the whole chunk system.

In a general sense i would like reach a state where the player is more or less unaware that the world gets loaded in the background (no visual artifacts of reloading) which probably will involve pre-caching more of the world at startup (that might help immortius)

On a side note: Playing with compression at least assured me that we can stream the world really easily for net-play. In java this is just another stream wrapper.

I'm not giving up on this easy, I really want a smoother loading. If i was just playing the game the current state would really deter me from playing.
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Well, keep it up! Knowledge is power, and we'll move forward with optimization more than we'll hit setbacks over the long run :)

I notice from the debug stats that memory usage climbs and GCs at a mad pace. Will it help any to try stabilizing that, even if we raise overall memory usage by keeping more chunks in memory than visible?
 

Kai Kratz

Witch Doctor
Contributor
Architecture
Logistics
And again i have to report a mistake:

There was a subtle bug in the measurement i posted above. The test was unable to read the majority of files and just ignored them, silently...

This is the corrected measurement
Code:
class org.terasology.logic.world.ChunkCacheGZip in memory cache size is 297677 bytes
class org.terasology.logic.world.ChunkCacheGZip write time per chunk is 5.592000 ms
class org.terasology.logic.world.ChunkCacheGZip read time per chunk is 4.304000 ms

class org.terasology.logic.world.ChunkCacheUncompressed in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheUncompressed write time per chunk is 0.000000 ms
class org.terasology.logic.world.ChunkCacheUncompressed read time per chunk is 0.002000 ms

class org.terasology.logic.world.ChunkCacheDeflate in memory cache size is 291677 bytes
class org.terasology.logic.world.ChunkCacheDeflate write time per chunk is 5.522000 ms
class org.terasology.logic.world.ChunkCacheDeflate read time per chunk is 3.710000 ms

class org.terasology.logic.world.ChunkCacheFileSystem in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheFileSystem write time per chunk is 11.866000 ms
class org.terasology.logic.world.ChunkCacheFileSystem read time per chunk is 25.840000 ms
And one more for the latest commit from my branch:
Code:
class org.terasology.logic.world.ChunkCacheGZip in memory cache size is 307256 bytes
class org.terasology.logic.world.ChunkCacheGZip write time per chunk is 5.678000 ms
class org.terasology.logic.world.ChunkCacheGZip read time per chunk is 3.904000 ms

class org.terasology.logic.world.ChunkCacheUncompressed in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheUncompressed write time per chunk is 0.002000 ms
class org.terasology.logic.world.ChunkCacheUncompressed read time per chunk is 0.000000 ms

class org.terasology.logic.world.ChunkCacheDeflate in memory cache size is 301256 bytes
class org.terasology.logic.world.ChunkCacheDeflate write time per chunk is 5.214000 ms
class org.terasology.logic.world.ChunkCacheDeflate read time per chunk is 3.792000 ms

class org.terasology.logic.world.ChunkCacheFileSystem in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheFileSystem write time per chunk is 6.382000 ms
class org.terasology.logic.world.ChunkCacheFileSystem read time per chunk is 25.115999 ms
The read times are not really stable. I'm not sure if i should investigate further and compute the read time variance over a longer period of reads but currently I tend to ignore it since I am now sure it is a good idea to use in memory compression (deflate) for our chunks.

Your remarks are as always very welcome :)

EDIT:

Tested it on differnt PC that can actually run Terasology (same overall result)
Code:
class org.terasology.logic.world.ChunkCacheGZip in memory cache size is 1226104 bytes
class org.terasology.logic.world.ChunkCacheGZip write time per chunk is 2.770500 ms
class org.terasology.logic.world.ChunkCacheGZip read time per chunk is 1.665000 ms

class org.terasology.logic.world.ChunkCacheUncompressed in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheUncompressed write time per chunk is 0.000500 ms
class org.terasology.logic.world.ChunkCacheUncompressed read time per chunk is 0.001000 ms

class org.terasology.logic.world.ChunkCacheDeflate in memory cache size is 1202104 bytes
class org.terasology.logic.world.ChunkCacheDeflate write time per chunk is 2.629000 ms
class org.terasology.logic.world.ChunkCacheDeflate read time per chunk is 1.407000 ms

class org.terasology.logic.world.ChunkCacheFileSystem in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheFileSystem write time per chunk is 5.456500 ms
class org.terasology.logic.world.ChunkCacheFileSystem read time per chunk is 5.697000 ms
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Crunched the test on my 9GB DDR3 machine with SIZE 5000 :)

class org.terasology.logic.world.ChunkCacheGZip in memory cache size is 3066153 bytes
class org.terasology.logic.world.ChunkCacheGZip write time per chunk is 3.107000 ms
class org.terasology.logic.world.ChunkCacheGZip read time per chunk is 2.065200 ms

class org.terasology.logic.world.ChunkCacheUncompressed in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheUncompressed write time per chunk is 0.001600 ms
class org.terasology.logic.world.ChunkCacheUncompressed read time per chunk is 0.000400 ms

class org.terasology.logic.world.ChunkCacheDeflate in memory cache size is 3006153 bytes
class org.terasology.logic.world.ChunkCacheDeflate write time per chunk is 2.735000 ms
class org.terasology.logic.world.ChunkCacheDeflate read time per chunk is 1.652000 ms

class org.terasology.logic.world.ChunkCacheFileSystem in memory cache size is 0 bytes
class org.terasology.logic.world.ChunkCacheFileSystem write time per chunk is 4.721400 ms
class org.terasology.logic.world.ChunkCacheFileSystem read time per chunk is 2.689800 ms

Edit: After a quick bit of discussion on IRC I understand the stats a little better and want to share!

Test loads x blocks in memory, where x is the SIZE variable of ChunkCachePerformanceTest

ChunkCacheGZip and ChunkCacheDeflate both are using different compression strategies in memory, so here 5k chunks are stored in 3MB of memory, which is pretty dang tiny. The difference between them is pretty small. ChunkCacheUncompressed stores in memory without compression and is hella-fast, but also eats a ton of memory (781ish MB?). Finally, ChunkCacheFileSystem reads blocks from disk and is slower than either compressed in-memory cache, tho far less on my supposedly beefy PC vs Kai's struggling laptop.

So the takeaway is that if we use a secondary compressed memory cache to supplement a small uncompressed memory cache core we can keep a huge area loaded without having to bother with disk I/O
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Merged to develop! Great stuff :)

Saw some fun quirks that aren't at all problematic, but notable anyway - sometimes more distant chunks load sooner than closer ones :D

Also, not sure if it has gotten more pronounced lately, but it feels like I'm looking at the world through a fish-eye lens. I find myself examining distance objects by putting them on the edge of my vision as they're vastly closer that way. I think we've had a player or two comment on that too. Can we tone that down? Maybe the level of distant blur too (fish-eye lens with below prescription strength!). You can even a circular darkening around the corners of the window :)
 

Attachments

Top