Architecture discussion for v3 [engine split / engine-libs]

oniatus · Aug 29, 2017

The v2 branch already includes a lot of features, most of them coming from the actual gsoc.
I would like to use this thread for a discussion how the architecture of the engine should look like in v3.

My main issue with the current project is the size. Everything is in the engine, all in one huge project.
The advantage is obvious, you find everything in one place.
The disadvantage in my opinion: Everything should not be in one place, especially not when building software

We have a lot of subpackages, lets have a look at them:

audio: sound stuff
config: property system, to be extended/exchanged with the flexible config soon
context: dependency registry
engine: wiring systems together (oh, we have an engine in the engine - engineception?)
entitySystem: engine layer above gestalt entitysystem, adds lifecycle and prefab support
game: data classes for a game (setup, seed, ...)
i18n: internationalization
identity: certificate stuff
input: input and bindings management
logic: (ranting ) "i don't know where to fit my sources, but it is some kind of logic" - contains a bit of everything
math: engine math classes, based on TeraMath and bullet
monitoring: performance monitoring
network: network logic, also one of the "no one will ever touch it" sections
particles: particle system
persistence: serialization of events/entities
physics: physics system
reflection: utility classes for reflection
registry: close to context, holds the dependency injection
rendering: magic section
telemetry: telemetry gsoc code
unicode: i foud no references to any class in this package, looks like dead constants to me
utilities: close to logic, a bit of everything
version: version dto
world: world stuff, like generator logic, block shapes etc.

In my opinion, these should be separate projects and have as little dependencies to each other as possible. This way we can test them separate, define a proper api for each block/layer and wire everything together in the engine. In fact, we did that already, it is only hidden in one huge thing with 250k lines of code

Of course it will not make sense to do a hard package split, we need to inspect things in detail.
Therefore it will make little sense to do it in one huge change.

What I would propose is the following:
1. Introduce a new layer between modules and the engine: Engine-libs. Move the entire engine project to engine-old and make it depend on the old engine. The code will do the same and be under the same versioning.
2. Start to move small pieces to more engine-libs, e.g. math, physics, ... The code will stay the same and will also be under the same versioning (I think checking out 30 repos to get the engine working is not useful).
This will also reveal architecture isses because the new engine-libs should not depend on engine-old.
If we repeat this for some time, we may end up with a better split based on projects, have the option to throw away entire projects with a new implementation (inventory is next?

) and make it easier for a new developer to get into the code.

Opinions are welcome :gooey:

Cervator · Aug 29, 2017

Disclaimer: Everything IMHO and possibly sleep deprived rantings

Good stuff and agreed

I still have a v2 architecture/goal post floating around in my head to post sometime + set up related GitHub projects and such. I think the focus there should be finalizing the game logic foundation - the external API for modding and such. Multi-world, extracted Gestalt-entity, etc. The building pieces we'd want to have in place to go a long time without breaking changes, even something we could hit Beta with. That may be more of a v3 if we want to merge all the GSOC stuff needing a v2 and this could be v4, still a bit unsure ...

This (continuing to call it v3 for now) could be the next round after that, which cleverly could be done in a way that doesn't affect the external API. So in that case it might even sneak by without needing a major version increment for the engine, but it might justify it anyway.

I think there are three different categories of stuff to extract:

(External) Libraries - things that have utility on their own, even perhaps to other projects. TeraMath is a good example, not sure how much is left in the engine or if the rest could be moved over. In that case we actually might want to move to JOML or something from LibGDX, maybe redoing TeraMath as a minimal wrapper with voxel-specific extensions. Context/registry might be another useful thing to extract (has its own thread). Entity system partially goes away when we depend on gestalt-entity instead (hoping for v2 - can dream, right?). Maybe even something like the renderer one day. World generation with facets? That might be an extension instead ...
Modules - some things remain in the engine that could be moved to modules. Particle system might be a good example, and it may well need rendering to proceed a bit more to enhance what can be done in modules. The behavior tree stuff is another one (and is at least partially done with that GSOC item, again maybe pending more extraction/reorganizing - unsure)
(Engine) Extensions (the engine-libs thing) - things that aren't really entirely complete on their own, and wouldn't make much sense as an independent library. This would be stuff that is tied to the engine so much and/or is an optional extension to the engine.Telemetry may be a good example, since the whole thing can be disabled. Maybe the performance monitor or some of the world gen stuff. Maybe even Soni's desired internet/database/filesystem connectors, but that feels more like highly privileged modules IMHO. We'd likely want to ship all these with the game but leave them optional and allow alternatives. Gradle/Groovy/Git magic can help develop them (and everything else) in a single workspace.

It is late here and this is just a "real quick" initial response (yes, quick!), but we could go over the packages in more detail and thinking about them individually might help a lot. Like would i18n be an external library or a subsystem / engine lib / extension?

As for approach I would be very hesitant to rename/move the entire engine dir in one go, as it tends to end up resulting in huge headaches around pending work (or a tricky code freeze) and handles poorly in revision history (at least some tools don't follow renamed/moved stuff well). I'd just start formalizing the modding API better (public API / front-end), letting it optionally start rerouting away from an "old" engine system to a new one at some point in the future, doing systems one by one. Eventually as activity moves to a new dir/repo the old history becomes less useful, the affected files can be marked as deprecated, and finally deleted in a major engine release some time after.

manu3d · Aug 29, 2017

Generally speaking I agree with @Cervator's assessment. @oniatus I also agree with your highlighting of the problem, I'm just less convinced about drastic solutions.

I'd certainly endorse a package-by-package evaluation and a roadmap tagging some packages for extraction from the engine. I don't think it would make sense that -all- packages listed above become separate projects, certainly not initially.

I.e. it is possible to identify a number of packages that makes sense to keep in the engine as 1) they characterize the engine and differentiate it from other engines 2) they provide services that are very likely to be used in all Terasology-based projects 3) provide services that are not exactly exciting aspect in the videogame world. I.e. I am thinking about config, reflection, internationalization, telemetry and inputs packages to name a few.

On the other hand I can certainly see some packages such as rendering, particles, logic and game that perhaps could be extracted - the first two for sure, the other ones I don't know them enough.

So, there is potential there, it sounds quite an effort from where we are. I'd want to be well into v2 and perhaps even in an intermediate v3 before we tackle something as big as what you are suggesting.

oniatus · Aug 30, 2017

I don't want to say we should do all at once

What we can do, even without touching the existing codebase:
-create a new project engine2 (or a better name)
-make PC and the other facades depend on engine2
-make engine2 depend on engine
-keep engine-tests as is

engine2 would be in a folder next to engine and under the same version control and be initial empty. This will not break existing PRs and preserve existing functionality.
Then:
-create a second folder engine-libs, containing fully qualified gradle projects.
-once we identify a feature set which can be extracted, create a new engine-lib-project for it
-add the engine-lib-project as dependency to engine2 and move the code from engine to the engine-lib-project
-move tests for the feature directly to the feature project
-(best case) add CI for each engine-lib-project, they should be buildable and testable on their own and have NO dependency to engine2, they are likely to require engine as dependency.

We can always extract features with no pending PRs to keep conflicts as low as possible. If we iterate the process over a longer time period we may end with an empty engine, a lot of wiring logic (subsystems, main loop,...) in the engine2 and a lot of small projects with little to no cross-dependencies which can be tested or exchanged as needed.

My goal is not to extract separate projects and repositories which can be re-used in other engines or third projects. Instead I want to split based on responsibility to clean up the architecture and grow clean islands in new projects where it is easier to get started than hopping in the entire engine at once.

Cervator · Aug 30, 2017

Okay, I think I understand you better now then. You want "engine-libs" to be like "facades" with subdirs being entirely independent sub-projects / modules (in IntelliJ speak, not Terasology speak - but perhaps also in Java 9 speak). That helps split everything apart and you more readily can calculate unit test coverage and solidify "sub-APIs" per component rather than let the engine be its own huge thing. I had just thought about the extensions concept for high privilege modules, but really the logistical benefits of splitting apart components/APIs for unit testing is a big deal.

Not so much split out into independent repos, but split apart within the engine repo. Although at that point - why not have separate repos? Multiple CI jobs on the same repo (one building engine-libs/telemetry, another building engine-libs/particles etc) seem like they'd be kinda awkward (but doable). The PR builder type build logic tends to expect one "thing" per repo, so that any PR applies to the specific thing in that repo. I dunno how much hassle it would be to listen globally (some commit landed in the repo) but act locally (only do CI, review, etc, for the subset). Pretty sure you can do it, especially with Jenkinsfiles, but dunno how awkward it might get. That was the approach on my last work team and IMHO it was awful - but admittedly those were mega-monolith legacy apps facing a flawed DevOps design (again IMHO - DevOps is so vague these days)

Whether we find ways to treat them as clean islands within a single repo or split out into repos but Gradle/Groovy logic it so well it becomes seamless I agree with that part of the approach

I'm still not convinced about "engine2" tho. What does that buy us that moving, say, telemetry under engine-libs/ doesn't on its own? It might feel cleaner to do the roundabout step of fitting in engine2 as an API layer of sorts? Which is what I was getting at a bit last post: formalize the modding API better. Maybe modding API is too low scope there - a new API layer so we have engine - API - engine-libs - API - modules? Or something like that. Then we also don't need to rename "engine2" to anything more sensible later - just call it something to do with API layering?

In any case we certainly could start by simply splitting things out of the engine dir, but staying within the same repo. Much like how we took content-type stuff out of the engine and parked it in Core, pending full extraction out into purpose-specific module repos.

I'm very split repo happy (since I like CI and PRs per repo), but will admit it adds to the logistics, without supremely well done automation. And it is largely on me not having pushed the utility angle there further yet, with just the Groovy Wrapper and the new utility scripts prepared recently - more coming, hopefully

oniatus · Aug 30, 2017

My main reason for a single repo is the developer workflow to keep the engine up to date.
From the perspective of a developer who checks out the engine repo(s) once, then after 3 months wants to continue and be up to date:
In the current state it would be a simple pull on the develop branch. If we split it in different repos, one would have to pull and manage multiple repos.
This is something I find difficult on the modules from time to time, e.g. keeping track of Pathfinding, Behaviors, the engine and 2-3 other related modules for the behavior changes was a pain

Should we do so, there must be an easy workflow for a developer to keep up to date, otherwise we may introduce viscosity in the environment which will also lead to bad code and workarounds

For CI: I think it is okay to build all projects in the engine repo on each change and maybe improve later on if it seems reasonable.
We are building the entire engine codebase on each PR right now and the split will not increase the amount of code to build, only the amount of projects -> and project resolution should be pretty fast with gradle, compiling, tests and static analysis should take most of the build time, right?
So I think we would not lose something this way. Instead we gain some places for later optimization on the CI side, if we can build artifacts independ from each other

Cervator · Aug 31, 2017

Absolutely agreed on the utility angle - more is needed to sustain split-repos. And it is doable too, the delicious luxury of "groovyw module recurse JoshariaSurvival" to then get every single JS module in source form in one go... yum

A super build probably would be slower than current, but yes, I don't imagine it'll be much slower. One drawback is if a tool expects only one "thing" - for instance at present the Jenkins javadoc publish expects only a single path, so it can't grab javadoc from independent sub projects and publish all of them. But in that case we could simply rely on that super shiny new doc site somebody cool made for us

We'd have to dig a bit deeper to see if there are any similar trouble spots, and how much hassle it would be to either get around them or live with a poorer result for something.

Thoughts on the "engine2" bit and API layering? If that even makes sense as a term.

oniatus · Aug 31, 2017

"engine2" as API layer sounds good, we may even call it engine-api then.
A separate project will also force us to touch everything we need at least once

Cervator · Oct 10, 2017

Tiny side note while I'm getting exposed to some things of interest at GitHub Universe: the new "code owners" feature might work well with these kinds of sub components if they remain in a single repo. Essentially you have a config file in the repo that declares specific dirs to be "owned" by specific users or teams which enables some more review options. We somewhat discourage explicit ownership, but another way to think of it would be expertise or responsibility in making sure the area is taken care of.

Cervator · Oct 22, 2017

Another popup thought on the topic: maybe this could help split apart the growing monolith that is the menu language file. The bigger it gets the more unwieldy to handle - maybe it would make sense to chop it into smaller groups, even if that means a few more components in Weblate (not that we have a lot yet - having a few for the engine and one per module seems reasonable)

On the other hand that may complicate things needlessly, especially for shared strings. And I'm not even sure the groupings are as clean as the nice list at the very top, the language file entries do not necessarily map to packages like that (they might all naturally belong in UI, even if behind the scenes a few are specific to audio-related strings)

oniatus · Dec 19, 2017

I gave it a short try this day and tried to add an engine-api project and make core depend on it instead of engine.
Did not work that well

one issue is that engine as project is hardwired in many places. One example is the ModuleManager which spawns engine as module in the constructor and makes it available to other modules. To get rid of this we would have to move the code to engine-api which will pull a bulk of other code behind.

My suggestion for now would be to go with the least-impact solution and keep engine as project as-is but add a set of submodules/subprojects directly to engine. These should have no dependency to the old engine, thus we will have to start to refactor out the essentials like the entity system at first. Advantage would be that this has 0 impact on modules and also 0 impact on ongoing PRs.

What I would like to have for these subprojects is a higher quality bar:

Required high test coverage (I suggest to start with 100% and lower only if needed) in the best case we can back this up with a tool like pitest which has a nice gradle plugin.
Same checkstyle setting as for engine.
Violations (too low coverage or checkstyle) should break the build.

Good candidates for the begin are probably the entity system and the module system as they should have little to no sub-dependencies.

Edit: PR started: https://github.com/MovingBlocks/Terasology/pull/3188

4D enthusiast · Sep 9, 2018

I sort of generally assume that unit tests are the sort of thing I'm likely to be more keen on after actually having more experience with problems in large projects, but that said, having obligatory test coverage seems like a bad idea. I can believe that there are cases where it would be useful to encourage more use of unit-tests, but there are also cases where the desired behavior is just something like the results looking right, and any tests made for that would mostly just be re-phrasing the main code. There's also the problem that whatever sort of automated checks you put in place, some people will inevitably just add enough to technically pass the test-tests, but not actually test things. We already have lots of empty doc-comments, presumably as people thought there should be doc-comments for something but didn't actually want to bother writing them. I don't know how coverage is measured, but it's at least impossible for it to be perfect (halting problem &ct.).

Separating the engine into parts like this also feels likely to be more hassle than it's worth. I'm already finding myself frequently changing the engine API, and if there were an API boundary between terasology.world and terasology.rendering, for example, that API would end up changing a comparable amount to the engine-lib internals. Perhaps there are some other parts of the engine which would work better, but I have less experience there.

The conversation seems to be mostly settled against having these as separate repos, but as an additional point in that direction, coordinating matching PRs for several engine-libraries at once sounds awfully awkward.

Cervator · Sep 12, 2018

On test coverage - I might be reading between the lines here but I figured @oniatus meant to select for sub-projects that would be suitable for very high test coverage to reach for the big 100%. It is true that some things test more poorly.

The engine-libs thing in general IMHO has good potential, but isn't necessarily worth prioritizing for some time. There is a lot of more end-user important stuff that should come first, part reason I left engine-libs in v4 on the roadmap, rather than in v3. And that's still more of a guideline than a given

I would picture it as closely grouping things behind the engine API barrier so you could change the engine-libs readily without affecting engine-api, focusing any big work on that sort of approach, then when truly needing to affect engine-api do it in a way where you can slowly trickle it in rather than do a big-bang PR explosion. Imagine first preparing a large new feature behind the API wall without exposing it at all outside some proof-of-concept / testing layer, then when it is stable merge it and then prepare the API affecting steps to truly expose it in a separate and vastly smaller operation.

Admittedly that may not help much if as you say you're working on something that affects multiple API-separated chunks of the engine in one go. Or if for some reason you find need to change things massively regularly - but at that point I'd say that's a sign that we're doing something wrong.

Right now we still have a fair amount of work to do on the engine before it is juuuuuust right and suitable to lock down further. That's what I'm expecting v3 is for, although on the other hand we're way overdue on the whole "Focus more on making a game than an engine!"

Architecture discussion for v3 [engine split / engine-libs]

oniatus

Member

Cervator

Org Co-Founder & Project Lead

manu3d

Active Member

oniatus

Member

Cervator

Org Co-Founder & Project Lead

oniatus

Member

Cervator

Org Co-Founder & Project Lead

oniatus

Member

Cervator

Org Co-Founder & Project Lead

Cervator

Org Co-Founder & Project Lead

oniatus

Member

4D enthusiast

New Member

Cervator

Org Co-Founder & Project Lead