Performance survey

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
I have just pushed some new changes mostly related to the problems I've described earlier. I've also changed the way the daylight value is calculated. From now on it is based on the angle of the sun provided by Anton's skysphere. This allows very smooth transitions at dawn/sunset and a much more fitting daylight value at each point in time.

But now the important part... The performance of Blockmania. :) I've added a more general optimization yesterday (reduced the amount of leaf blocks/faces that are actually drawn). Today I finally found out why the game stuttered from time to time on my machine. Currently each chunk update is executed in a separate thread (making this possible was a lot of work actually) and the amount of executed threads is limited by the amount of cores of the user's CPU. If this limit is reached, no more updates can be queued for updating. This led to problems when the player modifies blocks right in front of him, since this introduced a noticeable delay. So I implemented an observer-pattern kind of mechanism that allows forced chunk updates (utilizing one more thread than actually allowed). That is the system that has been active for the last couple of weeks.

Now I pushed a update that actually limits the amount of threads started in one single iteration of the render loop to 20 possible updates per second. This works really well on my machine, but I've got a very strong processor (hexa-core I7) so this is not the most reliable source for benchmarks that actually matter. I constantly achieve FPS far beyond 200 with a viewing distance of 32 chunks (mostly related to the new trees and the leaf optimization).

Here's the final question: how is the current version of the game performing (develop branch) on your side and what are your specs? :ugeek:
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Will give this a shot when I get home. Was previously tinkering with that a bit with Stuart, who hit some odd issues when running Blockmania while also streaming a movie to a TV (disk I/O issues? CPU and memory were OK)
 

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
Cervator said:
Will give this a shot when I get home. Was previously tinkering with that a bit with Stuart, who hit some odd issues when running Blockmania while also streaming a movie to a TV (disk I/O issues? CPU and memory were OK)
Interesting. There should not be much of I/O until the chunk cache has reached its limit. None if saving is completely disabled. Just found out that I have bricked block lights a while ago... Pushed a fix and another small optimization. Also fixed the flickering of blocks at the borders of chunks.
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
I noticed two lighting quirks so far (not with the new stuff, over the last few weeks), wonder if you've found/fixed those?

One is how blocks immediately outside generated caves (with no see-through holes) will somehow end up shaded on the outside (giving away the fact that there's a cave behind there)

Another is how you can find spots in caves (seen this multiple times) where lightning just refuses to work, period. Placing torches lights up the area up to a point, then it is just total darkness, even placing torches inside the darkness.

Of course, if we could identify those spots and just spawn the right kind of monster inside it we could call it a feature :laugh:
 

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
Another is how you can find spots in caves (seen this multiple times) where lightning just refuses to work, period. Placing torches lights up the area up to a point, then it is just total darkness, even placing torches inside the darkness.
That's the thingy I mentioned and fixed earlier this day (and I've bricked some days/weeks ago). Glad I found the cause without hassle. :? "The first bug" you are mentioning will be a bit harder to fix since it's caused by the nature of my algorithm which calculates the lighting on a per vertex level. Got this on my list though.
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Yay fix :)

Second fix is not at all important yet, so no worries there. It might not even be a horrible feature, really (especially if it helps avoid performance issues)

Okay - pulled Master, played around with it (yay easy pull + push via IntelliJ!). Skysphere looks great, although I don't see any clouds. Beautiful world, even if this seed seems to generate a lot of floating blocks, and a nice upright horse-shoe-shaped river :D

Two quirks noted:

1) Skysphere stuff looks a little too big IMHO - stars, sun, moon. Making everything relatively smaller would be a nice touch, I think.
2) The stars currently follow the player walk head bob thing, which looks sort of silly :D

Performance - game idles at 13% CPU for me, spread unevenly across 4 CPUs (well, 2 + hyperthreading) on a quad-core i7 @ 2.67 GHz. Can push it up to around 30% by running around, but my fan doesn't spin up crazy now during chunk generation like it did before :)

Hovers at around 1.4 GB memory out of 9 GB DDR3 total

Video card is Nvidia GeForce GTX 260 - but I'm not sure how much we take advantage of GPU vs CPU currently? Also curious exactly what does boost cpu usage up to 13% even idle, pure lighting updates? Would it be possible to limit light updates if the player can't see much anyway (underground) ?

Edit: Oh, not sure if this is new, but when I place a torch the surrounding light tends to update in two barely separated instances rather than all at once

Edit2: On a different note, the light level at night is perfect! Dark is good - fake WoW night that's more like day is bad ;)

Do clouds make it darker at night?

Anton - if you want to do some more advanced sky stuff, rather than super duper fancy clouds we don't get a whole lot of functional benefit from (since we're not flying around in the clouds all the time), how about implementing the phases of the moon? If we only occasionally have a full moon we can base timed stuff on it, like making werewolves come out at night only with a full moon ;)

Heck, why do we even settle with a boring ole single Earth moon? Wouldn't it be neat to have the ability to have more than one moon, not linked directly to the phase of the sun, intelligently showing only the correctly lit part of each moon? Different colors? :D

That was one neat thing from the Krynn / Dragonlance universe, since they had three distinct moons that tied heavily into the lore and even magic system of the world... yet it is so rare yet to see more than one sun + one moon :)

Probably not a high priority, but would be neat to see

Edit3: Okay, so I'm not managing any AI studying tonight, but I'll be done with this soon enough already :p

Saw the video at http://www.youtube.com/watch?v=Y4CYnVzKonA - very nice, really like how well the music fits the scene. At one point I could see the sun - and it looks different than on my PC? Seems smaller. Attaching moon + sun shots from my PC (tho the moon had nearly scrolled out of view)

I was also able to provoke an occasional issue where destroying blocks in the direction of the sun at the horizon allowed me to glimpse the sun before the next block was drawn
 

Attachments

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
Woah. Thank you for all your feedback. Much more than I could possibly imagine! :shock:

I've just pushed some fixes that should address the lighting issues you were describing. This fix will only work efficiently on multi-core CPUs for the time being though. I've also decreased the darkness of shadows. Found that it was just a bit too much especially below trees and such. I'll get back to you on the other issues as soon as I have investigated them further! ;)
 

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
I've started tweaking the skysphere a bit. Reduced the size of the sun and removed some minor visual artifacts. Let me see what I can do about the bobbing-skysphere-issue. :D
 

Adeon

terasology.ru
Contributor
Architecture
GUI
Logistics
The stars are just the texture.
We can increse its resolution and paintanything we like on it. =)
I'll try to do something with it.
Considering the Moon, yes, you're right. Now it's just a picture. That's not quite right.
Considerшng the phrases, I agree with you, we need them very much.

The stars currently follow the player walk head bob thing, which looks sort of silly
I haven't understood anything. That's because the character you're playing is drunk now. I'm creating a balalaika for him and a pet - bear.
I'm joking. I'll fix it. =)
I won't touch the proportions of the Sun in the near future, as I'm making the bloom and the sunshaft now and I don't know how that effects the final picture.
 

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
So, here they are... Occlusion queries (and occlusion culling)! Tightly integrated into Blockmania... and... Disabled by default. Why? Because using them efficiently is depended on the hardware setup and configuration and I do not want to exclude any users because their hardware does not allow to use this technique efficiently or at all. But I've also added a lot of other small tricks to speedup rendering.

So what are occlusion queries? They are a technique that allow the measurement of pixels on the screen after (secretly :)) drawing something. Very simply put. This can be done asynchronously – so the GPU calculates the result and CPU fetches it without ever interrupting the rendering process at any time. So in a practical sense the technique can be used in the following way to reduce the overall amount of rendered quads and vertices:

  • Render the (solid) bounding box of all chunk sub-meshes starting at the player's position (which can be done VERY fast)
  • For every single sub-mesh check if it is occluded or not
  • If not => Render the actual sub-mesh
  • Otherwise => Do nothing
This seems to be quite easy on the first look, but it takes some tweaks to be handled efficiently (and asynchronously). But solely using this technique with 256 blocks high chunks is not an all too good idea since even the bounding box of the most distant chunk is visible almost all the time. So I've split chunk meshes into configurable amount of sub-meshes which are then used for frustum and occlusion culling. But here comes the next problem. Increasing the amount of sub-meshes increases the overall rendering batch count which results in an increased transformation and rendering overhead. So this technique tends to actually decrease the overall rendering performance if its not treated correctly. I've put in some tweaks so only distant chunks are used for occlusion culling (only the last 10% of the visible chunks at the moment – but this can be easily changed in our config file) and instead of 8 sub-meshes (like Minecraft uses for example) I'm currently only splitting the chunks in half. I've got some more ideas to fully utilize occlusion queries, but It works fine if the system setup is right.

I've significantly increased the rendering performance on my setup to over 200% (even without occlusion culling). So at a spot in our current default seed, where my FPS sunk to below 60 using the old rendering code, I get over 150-160 FPS which is quite nice. This is currently tested using OS X Lion.

I've added a screenshot showing the new sub-mesh technique with 8 sub-meshes per chunks. This also shows that empty sub-meshes are ignored during rendering (no visible bounding boxes). The other screenshots show the occlusion culling technique with 2 sub-meshes per chunk and a VERY HIGH 25 chunk viewing distance (50 chunks in the settings). This makes a total amount of > 500 full chunks displayed at the same time and about 1000 sub-meshes. The debug view shows the following statistics: vsc (visible (full) chunks), ocul (removed sub-meshes due to occlusion culling), smcul (removed sub-meshes using frustum culling), ec (removed empty sub-meshes).

I need some more feedback (with and without occlusion culling enabled). You can also experiment with the amount of sub-meshes per chunk and the distance offset at which occlusion culling is used. Just take a look at the config file, I've commented the relevant spots. :geek:
 

Attachments

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Neat! Not much else I can say to that yet, other than that I'm happy to see lots of tweaking space for graphics. I'll test tonight when I get home from work :)
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Okay, tested a bit, after first spending a little time finding out how to test it - I'm going to have to get you to post more "QA Steps" so others can help out, I got Stuart to test it a bit too (after going through some over-technical fun with Git) but now he wants to do bad things to both of us for programmer speak overload :D

We both had trouble even getting it to lag to test with, but enabling both occlusion and physics, then upping the view distance like mad helped:

viewingDistanceNear = 32
viewingDistanceModerate = 40
viewingDistanceFar = 64
viewingDistanceUltra = 72

(incidentally, do they ALL count, or is there a different setting picking one of those?)

I additionally had to enable god mode and fly up in the sky so i could see a ton before it would really lag - and it was pretty solid lag at that point with 13-33% CPU and 2.2-2.5 GB ram (the latter when I was also blowing up tree tops). We also messed with the day/night cycle to be able to see better, which may also have impacted performance?

As a reminder I've got a quad core i7 with 9 GB of DDR3 ram

This is where I think the scripted "thin clients" via the GroovyManager might really come in handy, as you could write up a test script and then simply ask people to execute it. We could even come up with a standard "harvest hardware stats first" piece and maybe even record metrics :)
 

begla

Project Founder and Lead Developer
Contributor
Architecture
Logistics
Those viewing distance can be toggled using the F key. Damn. No one saw this tiny change in the README. :D I've put a very short viewing distance in as default to make mobile users happy. :) This can be all decided dynamically later on, but it seemed like a valid solution in this early stage of our game.

So if I understand correctly... The game runs fluently? Much better than before? Glad to hear that!

If you want to use a 72 chunks view distance, you'll need to drastically increase the chunk cache size to over 5,5k - 6k chunks. And you'll probably need to increase the heap size of the JVM to 2048 MB or more. Might get a bit ugly there. Heh.

EDIT: Like the benchmark script idea!
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Ah! We were just wondering about the chunk cache. Yeah, the game starts out at the crazy config pretty much fluent for me, but that's at low memory usage. Moving around a bunch then makes memory usage dash to 2GB territory when the lag behinds (and remains even if i fly back down to the ground). Physics I can use fairly easily, it lags, but I can god-mode + explodey tool my way through an entire mountain without locking up entirely.

And yeah, I know there's stuff in the readme, but there's a lot of stuff everywhar! and I'm forgetful :D

That, and I was distracted by the mountains....

Edit: Re-noticing the "runs better than before?" bit then I can't really answer that since I don't have a good baseline (I never really pushed the game until today). Totally all over the thin clients, but... that's a little ways out ;)

Poked somebody at work to see if he was interested in tinkering with profiling last week or so - will re-poke when I get a chance!

Edit2: Managed to push it to OutOfMemory on heap when going to 96 view distance at 4k cache, nearly 3 GB process size before it croaked :D

(didn't play with heap settings, it was chocking pretty bad at this point anyway, hehe)
 
Top