Modding: per-block-storage extensions

Panserbjoern

Member
Contributor
Architecture
#1
Hi

I am currently working on per-block-storage extensions for mods. It's not yet finished but i've made quite some progress in the past couple days.

Branch: https://github.com/mbrotz/Terasology/tree/per-block-storage

Please don't merge anything of that branch, though! I'm playing around with rebasing.... ;)

What i've done so far:
  • Everything related to per-block-storage (tera arrays, chunk serialization, runtime chunk compression) is now handled via the new PerBlockStorageManager.
  • TeraArrays, factories, serialization handlers and deflators are now loaded via reflection.
  • Mods can now implement their own tera arrays including serialization handling, array factories and deflators, which will be detected and loaded on startup.
  • Mods can request per-block-storage extensions in their manifest (mod.txt).
A mod.txt with a per-block-storage extension would look like this:

Code:
{
    "id" : "books",
    "displayName" : "Books and Bookcases",
    "description" : "This mod introduces books and bookcases",
    "perBlockStorageExtensions" : [
        {"id" : "some-per-book-data", "factory" : "engine:16-bit-sparse"}
    ]
}
Todo:
  • Implement the actual extensions (allocation and persistence).
  • Extend world views and the world provider to allow accessing the data.

Hope to land it soon! Any questions, suggestions, requests... ? :D

Panserbjoern
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
#2
Happy to see this get posted, have been seeing the commits on GitHub and eagerly awaiting news :)

Want this moved to the Incubator by the way?

Big question I have: Do you have an idea of how many blocks in a chunk should have unique data stored with this system before it is worth using it, performance-wise, over a pure entity-based approach instead? Something like bookshelves likely would be better with the ES as the blocks are very rare in a chunk, as opposed to something like a "temperature" example.

How much does the chunk compression help? I figure it goes something like this:
  1. Low count of used unique per-block data in a chunk (example: bookcases) - definitely use entities, not this
  2. Moderate count of used unique-per block data in a chunk (example: block integrity for most solid blocks) - compression pushes memory usage down to where it makes sense to use this system instead
  3. High count of used unique per-block data in a chunk (example: temperature) - have to use this system as it would otherwise take way too many entities
The trick being identifying the rough transition from #1 to #2
 

Panserbjoern

Member
Contributor
Architecture
#3
Hi Cervator

Yes, why not move it to the incubator...

Well, using the books mod as an example was a bad choice, of course. You would rather use a per-block-storage extension for stuff like temperature, moisture or block-integrity.

I think, as soon as you have more than just a handfull of blocks one should use a per-block-storage extension. The efficiency of the current runtime compression implementation depends on the scattering of your per-block data....
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
#4
Moved to Incubator :)

Want to add the standard header to the first post here?

I see the PR and it looks very cool! Now we'll have to come up with some stuff to do with it :geek:
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
#5
A thought: I'm wondering if we should take a different approach with the serialization. One of the weaknesses of the SparseArrays is they can degrade during use, so perhaps we should just inflate them and/or RunLengthEncode them during storage, and re-deflate them as part of the load process. Then we won't need a different serialization form per array - we would store all of them run length encoded or whatever, with only the data type differing (byte, short, int, maybe nibble). Obviously the loss is some load/save speed, with the gain being some simplification of storage, greater compatibility (the save files won't depend on the different implementations) and potentially some space savings (the tera arrays are structured to remain performant for random read and write, the save format need not be restrained by this).
 

Linus

Member
Contributor
#6
Very cool. I think I can use this for the Cellular Automata system.
I wonder though, does it affect the memory usage of chunks that do not use the extra data (chunks without bookcases)?
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
#7
Per chunk data would affect every chunk. I feel it should only be used rarely and for things that have reasonably wide coverage and which need to be persisted - light, liquid and integrity are good examples. A lot can be done with entities - which don't have to be per-block - and a lot can be done with transient data in systems, as the signal mod does for circuits.
 

Linus

Member
Contributor
#8
Ah, okay. So if a layer of data is added the data should also be general enough to encourage reuse.
To stay with the liquid example: liquid data should be preferred over water data, so it can be reused for oil and magma.
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
#9
Yes, good point, although it assumes different liquids cannot overlap (which is likely the case).
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
#10
A sheen of oil on top of an ocean would be cool though :geek:

Immortius - how do you think the idea of treating liquids like a non-block would impact this? Rather than per-block data could we use some other data type/storage at a per-chunk level to track the bodies of liquid in it and their interactions with a liquid simulator? Or could we still support per-block data at a chunk level only as needed somehow?
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
#11
There's a few ways of doing them. One way might be to have an entity for each body of liquid for each chunk, containing a bitmask of which blocks contain the body in a subregion of the chunk. Then the body as a whole would contain a volume of liquid and flow calculations would happen between bodies.

Really it all has to tie back to how we want to render liquids and how liquid updates occur though.
 

4D enthusiast

New Member
Contributor
#12
I previously assumed that this would take an excessive amount of memory, but with the moderate view-distance setting (13*9*13) it would only be about 74MiB for 1B per block (+ a presumably much smaller overhead) using TeraDenseArrays. That's not trivial, but it also seems like an acceptable amount to use for a significant additional feature.

A possible way to reduce this somewhat is for each mod which registers per-block memory to specify which types of block this applies to. In many cases (e.g. temperature) this would just be every block anyway, but sometimes (e.g. liquid flow / solid structural integrity or plant growth/mineral composition) they would be disjoint, in which case it might be possible to pack things a little more tightly. That's more of an additional thing to be added after the basic version is working, but if it's desirable it might influence how the basic version is done.

The way I imagine this working is that mods can use an annotation or soemthing to register a method as generating a list of what block-storage it wants. The block storage manager then uses these to collect for each data-field: a name, a size/format and a list of blocks. It then calculates which data fields are required on disjoint sets of blocks and have the same data-type, and assigns some of them to be aliases for the same index.

I might as well also record here someone else's sugestion that we re-implement the existing things like lighting, biome and liquid data in terms of this, just leaving the existing methods as a shortcut for backwards-compatibility. Presumably these methods would fail or return a default value if used in worlds where no module requests those fields.

If we do do this, it would be good to not already have the getRawLiquid/setRawLiquid methods (PR #3309) in the fixed API.

Pinging @Mike Kienenberger (does that even work with a space in it?), @jellysnake and @Cervator , as they've been discussing this elsewhere.
 

Mike Kienenberger

Active Member
Contributor
Architecture
GUI
#13
@4D enthusiast, Yes, spaces apparently work, but you have to use @mkienenb to ping me. But I stumbled across your comment by accident anyway :) while trying to determine what the api for this feature would look like for a module developer.

As noted above, registering was originally handled like this in the module metadata file:
JSON:
{
    "id" : "books",
    "displayName" : "Books and Bookcases",
    "description" : "This mod introduces books and bookcases",
    "perBlockStorageExtensions" : [
        {"id" : "some-per-book-data", "factory" : "engine:16-bit-sparse"}
    ]
}
However, the java api to access per-block-storage seems incomplete.
WorldView (now ChunkView) contains
Java:
    public int getExtension(int blockX, int blockY, int blockZ);
    public int setExtension(int blockX, int blockY, int blockZ, int value);
    public boolean setExtension(int blockX, int blockY, int blockZ, int value, int expected);
I would think an additional parameter of String "perBlockStorageExtensionId" would be required.
Java:
    public int getExtension(String perBlockStorageExtensionId, int blockX, int blockY, int blockZ);
    public int setExtension(int blockX, int blockY, int blockZ, int value);
    public boolean setExtension(String perBlockStorageExtensionId,int blockX, int blockY, int blockZ, int value, int expected);
It seems like we would also want to have methods that work with something other than int (Bit4, Bit8 and/or byte, Bit16 and/or short) unless we are expecting the module developer and anyone reviewing the code to be an expert on bit shifting and data encoding. Having matching types would make the code more maintainable.
 

4D enthusiast

New Member
Contributor
#14
@mkienenb As data-extension lookups would happen so frequently, I think it would be useful (for efficiency) to have something like
Java:
public int getExtensionIndex(String perBlockStorageExtensionId); //This line probably somewhere else
public int getExtension(int perBlockStorageExtensionIndex, int blockX, int blockY, int blockZ);
as well as having the "getExtension(String..." version for convenience.

Actually, it looks like each WorldView had only 1 set of extension data accessible, which is set at its creation time. That would make getting the data possible, but seems like an awkward interface.
 

Mike Kienenberger

Active Member
Contributor
Architecture
GUI
#17
@4D enthusiast,

I'm not sure how much you want to get involved with the implementation of per-block-storage, but if you have little interest, it seems to me that we can define what the API should look like, then, in the short term, implement it over the top of the existing extraData TeraArray, or possibly add another one if need be. That would allow you to continue forward without being blocked by a per-block-storage project, and since the data allocation already exists, we can make it available without worrying about impacts to the existing engine.