Well, that's about 50x the reply I was expecting to get. Thanks for the thorough dissection,
@manu3d.
My bad for taking so long, it's just that every time I think I got it down, I'd notice something else that could break.
This reads like a terrible wiki page. Sorry for that. I'll clean it up as soon as I can.
And before we begin, can I just mention how wonderful of a learning experience this is proving to be. I don't think I've ever had to write a spec for something with this kind of scope or detail before, and it's pretty fun, if you ignore the incoherent screaming and the sound of heads being banged on walls
Okay, let's start with the easy ones.
- Okay, so I finally registered, just to reply to this thread.
You honor me,
@vampcat (awesome handle, by the way
).
- DON'T render with multiple threads. You just don't. The GPU is already using all it's cores and resources when you ask it to render via one thread, asking it to render via multiple will only slow down every single process, and will yield no performance benifit. Plus, there is the problem of contexts and displays being specific only to the current thread in a lot of libraries, along with lack of thread-safe code.. So that won't work. Plus, the headache of ensuring their isolation in edge cases like buffer swaps.. Will it be worth the time we invest in it?
Yeah, that makes sense. I didn't really understand how this worked before. And a fair bit of the reason I designed things the way I did was for dealing with those edge cases, so a lot of that can now change.
- I'm not sure why we need separate FBOManagers for each "layer", especially considering the fact that there are different types of FBOs (although we *can* have a dumb FBOManager and smart FBOs.. but that's a different discussion) and so we'll need at least 2 (one resolutionDependent, one resolutionIndependent) FBOs for each layer. Why a different context for each layer? I have slightly different concerns about this than manu3d.
I thought this over, and if we namespace FBO's correctly, we should be fine with just one FBOManager global to the (current) RenderGraph. The separate contexts was for the sake of allowing even more flexibility with Node design, but that's probably a little overboard, for now at least. It was also meant to help in making everything thread-safe, which as you pointed out, we don't really care about.
- Complicated? Please checkout a commit from mid-2014, just before I started working on Terasology, and take a look at the rendering code. Then we'll talk complicated. And inflexible. And fragile. Functional, definitely, but untouchable. And guess who ended up touching it so that other people can now aspire to improve it?
Tried going through that code. My brain threatened to melt out my ears 5 minutes in. I have absolutely no idea how you managed to fix that.
Also, the new render code is surprisingly clear. Still fragile and inflexible, but clear enough that it only took a moderately skilled Java programmer 2-ish hours to understand pretty much the whole thing. You can pat yourself on the back for that
- I'm not sure what you mean with "target frame" exactly
Here I was basically referring to the final render target. Do note that this could be something completely different from the main screen, e.g rendering CCTV feeds to a monitor texture.
- This was also discussed during last year's GSOC. We simply didn't get around doing it for lack of time. One probable tweak: ResourceUrns are probably overkill as we don't need all their features. We might want to switch everything to SimpleURIs, a very similar but slightly lighter class.
Noted.
- Sure, that's one way to do it. I wouldn't mind direct references either though.
The main reason I want to do this with URI's is so that Nodes can easily reference other Nodes, particularly standard Nodes defined in the engine, without having to worry about injection and contexts and stuff, something that we don't want module designers to really worry about. Kind of like how DNS simplifies connecting servers together.
- In the block above you seem to be hinting to some degree of CPU-side parallelism, with different layers potentially being processed at the same time. For what I know of the current (implicit) shape of the DAG, we are dealing with a tall, narrow tree, with relatively few side branches. Eventually we might have more considerable side branches, i.e. courtesy of CCTV monitors, portals, VR. But I'm not sure what you are writing is actually applicable.
Yeah, let's forget about that.
@vampcat has convinced me of the folly of that decision. And even if we wanted to render multiple things(Graphs), we could just take all the graphs and squash(toposort) them into a single tasklist, and run them sequentially.
- What do you mean with "inject tasks"?
I was thinking of the specifics of getting the objects from the modules into the RenderGraph's Context, but I don't think we need to worry about that just yet. We could probably use the same NodeBuilder syntax that
@manu3d and
@tdgunes came up with for injection.
- Can you please clarify what you mean with "allow us to give them the main buffer to work on at runtime"?
Okay, let's discuss
Nodes.
The way the system works right now, is that the WorldRenderer builds the Nodes at init, then flattens them and extracts a tasklist from them. If the RenderGraph changes, the tasklist is rebuilt.
Which is fine, but a bit inflexible.
I suggest we refractor the Nodes a bit, so that we(the Manager) can decide what their input and output FBO is. This instantly makes the Nodes far more flexible with very little work. Once again, note that this represents the working FBO that the layer is writing to. All nodes have access to other FBO's through the FBOManager. Then, Nodes shouldn't try to operate on things outside their Layers, aside from the FBO they were given when they were run. We'll have special mechanisms for handling things that need to cross Layers.
Honestly, this is probably something that should be implemented anyway irrespective of this project.
One caveat of this approach:
- If a Node is meant to process the current FBO, it has to not try to read it from the FBOManager, and just use the one it was given, as that would defeat the whole point.
Before I get to the rest, I need to clarify some things.
First, some
motivation. The main goals of this architecture are:
- Not having 190 lines of code in the initialize function (I counted).
- Making the API and process as declarative as possible(without becoming JSON or something, we don't need it to be that easy).
- Creating as many logical divisions between the Nodes as possible(in a way that isn't completely ridiculous). This means grouping Nodes into Modules, and Modules into Layers. (Or should we skip Modules and go directly to Layers? Not so sure about this one.)
- For example, the current render graph has 4 resampling nodes right after each other. We could group those in a module.
- Making modules modular. You should be able to connect modules to each other in ways the designer of said module did not expect.
- Making modules reusable. It should be possible to use a shader module(or a tint module) defined in the engine, to apply a constant effect to any FBO you want with minimal effort.
- Making debugging much easier. Implementing the render debugger would be almost trivial, if we get this architecture working.
- Maintaining performance. All this stuff looks expensive, but we can fairly easily resolve all the dependencies and URI's and stuff and squash the whole thing, and then use the resulting list of Nodes to render. Layers also are basically semantic constructs, and have almost zero overhead in normal operation.
A note on
Modules:
- A module is a logical grouping of Nodes. It has no correlation with the module system that Terasology uses, and I should probably find a better name for them.
Layers
When I was talking of layers, I meant these:
Each layer represents a different set of dependent modules to be rendered, and was also kinda inspired by
your issue,
@manu3d. A layer may be rendered after the previous one, or it may be declared as a sublayer with the help of a special Node. Each Layer passes around a FBO to it's modules, to which they are supposed to draw to. Splitting things in to layers can be extremely arbitrary. A major goal here is dealing with the dependency issue
@manu3d mentioned. One possible breakdown of layers(for representative purposes, this is probably terrible for a whole host of reasons):
- Game UI
- Stats page(like the one Minecraft has)
- NUI intefaces
- Foreground objects
- Middleground objects
- Background objects
- Outputs from other render targets(CCTV monitors, portals, whatever)
The main motivation for layers is that they are basically sandboxed; they (logically) draw their output to a seperate FBO. The main benefit of this approach is that, assuming the modules that set things like overlays set alpha channels correctly, the debugger becomes a GCI task. Here's what we could do:
- Move the offending nodes up to a new SubLayer.
- Tell that Layer to enable sandboxing.
- Add a tint Node to the end of the Layer.
- Profit.
A note on layer sandboxing:
- The goal is to separate the effect of a layer from that of all the other layers.
- This can be achieved by the following process:
- The first Node of the Layer is given a new FBO to write to.
- That FBO is passed to each Node in the Layer.
- The final output in that
Each Node object's URI specifies which Layer it belongs to.
The NodeManager/WorldRender(or whatever's running this whole show) is responsible for providing the Layer with it's target FBO.
Dependencies:
- Regarding dependencies: say node Engine:A renders the landscape and node Module:B recolors the output of A. Then comes node OtherModule:C who wants to add a lens flare to it all. You could say that in this circumstance node C is dependent on node B, in the sense that it should be processed after B has been processed, to take advantage of B's output. Except it's not strictly necessary. If node B is not present in the graph, node C could work directly with the output of A. So, what is node C really dependent on? Or rather, more generically, how does a node really know where to place itself? Layers would help, for sure, but within a layer I'm not sure I've heard or come up with a generic solution other than asking the user to use his judgement and do the wiring via a graphical interface.
I've been sitting in front of this open tab for a few hours now, trying to work this one out, and I'm more convinced than ever that layers is the right solution. Here's what I've thought of.
- The complicated way would be to make B depend on A, and create an special extra slot at the end of a layer for aftereffects processing. We could then move all those Nodes there. If multiple Nodes want to do fancy processing on the output FBO, we could clone the FBO, send a copy to each to be processed, and blend/overlay/add/whatever them at the end. This tactic could also be used to solve some general dependency conflicts.
- The simple way would be to move module C to a higher layer. This specifies that C depends on the Layer containing A and B, not any of the Nodes in the Layer. A bit of refractoring to support this, and problem solved. See the power of layers?
- And yes, this would probably lead to us resolving load order by creating tons of Layers. That should be okay though, since Layers don't actually add much to overhead, and they completely vanish in the final TaskList.
To sum up(tl;dr):
A tentative list of changes:
- Modify each Node/NodeBuilder/whatever to take input and output at the time of render pipeline init.
- Create a Module class (maybe builder) to create a sub-dag of Nodes that perform a single logical function.
- Create a Layout class (maybe builder) to create a dag of Modules that draw to a single FBO(that may or may not be the one everyone else is drawing to).
- Create a 'smart' Manager(or add the functionality into the WorldRenderImpl) to set this whole thing up, connect the Modules and Layers together, resolve load conflicts by moving things through layers, and squash the whole thing into a TaskList.
Current concerns:
- Would generating the tasklist be too expensive to run every time the graph rebuilds? We may need to rebuild only the relevant module, but that would require
- Is is very difficult, for whatever reason, to insist that the tasks take their FBO input and output targets at runtime? I haven't looked into the code at too much depth, so I'm not sure.
- We may have too many layers of tangled abstractions. Should we just get rid of the Modules?
- Is the layer system overkill? There probably aren't going to be many people writing code that directly interferes with the render pipeline, after all. And most of them can just use a NUI overlay or something.
- Do things that create partial overlays(like NUI windows) use alpha channels correctly to allow things behind them to shine through? If they do, we're all good. Otherwise, I've got no clue how to get this to work.
- How heavy is the overhead from creating a bunch of extra FBO's and then overlaying them on each other? If it's within 2-3x the optimized pipeline, it should be fine for render debugging.
- Designing the code for building thr pipeline is probably gonna feel like writing a compiler.
- First we load the Modules, who also load their Nodes and Tasks.
- Then we construct a graph in-memory with all the modules by resolving the URI's.
- We resolve conflicts by moving things though Layers.
- Then we squash the whole thing into a TaskList.
- I'm not sure how to move, create, and link Layers yet. Do we allow Layers to basically be treated as subroutines embedded within other Layers? How do we deal with Layer dependencies? Do Layers even have dependencies? (Yeah, this is as far as I've thought right now.)
I'm no architect, and this architecture is probably gonna need a
lot more revisions to get into shape. Any suggestions at all would be immensely helpful.
And now I'm gonna go find a bucket of ice to try to cool down my overheated brain.