Another render architecture overhaul

Discussion in 'Developer Portal' started by Hybrid, Mar 19, 2017 at 7:00 PM.

  1. Hybrid

    Hybrid New Member

    The current DAG system is pretty good, but it's also kinda complicated. I'd like to suggest a slightly more abstract architecture for the renderer here.
    The core ideas here:
    • The whole target frame is split up into multiple layers, with one layer for each logical partition of the process. So, there would be one layer for basic entities, one for applying textures to entities, one for NUI, etc.
    • Each layer and each Node get ResourceUrn names, namescoped to the module they are in.
    • Each layer also automatically gets two nodes signalling its start and end.
    • Each node lists the Nodes they depend on though ResourceUrns.
    • Each layer consists of it's own DAG, which specifies its render pipeline. All the nodes in a layer share a Context and a FBOManager. They can only access data from other layers explicitly, in the interest of stopping all those weird artifacts from non-deterministic ordering in parallel processing.
    • Each layer outputs to a single buffer. Each Node is invoked with a pointer to this FBO, and it is expected to write to it if it outputs. This helps in writing nodes used in multiple layers, such as tint nodes that could be used in the visual debugger.
    • If a Node wants to access an FBO from another layer, it needs to do so explicitly. For example, if a node in layer A wanted to access a resource in layer B:
      1. Block layer A.
      2. Wait for the currently running node in layer B to finish.
      3. Run the node in A with the FBO, block layer B.
      4. Continue as usual.
      Nevermind all that. We don't need it, at least nnot yet.

    • When all the layers are complete, the buffers are combined to the output.
    This also has the happy side effect of making it really easy to render in parallel, at least at the layer level. We could also implement isolated sublayers somehow, to extend this for layers with really heavy rendering.

    We also want to support modules, and here's one way we could do it:
    • Create a GraphManager class to manage Layers and Nodes, and expose some API functions for adding them.
    • Each module consists of a set of Layers(if the module wants to create new ones), and a set of Nodes with a desired Urn for itself(which mentions which layer it's part of) and a list of dependencies. Add these to the graph through the API.
    • I'm not really sure how to inject tasks properly, any suggestions?
    Changes to current codebase:
    • The nodes would have to be refractored slightly to allow us to give them the main buffer to work on at runtime, and they would have to specify their own Urn, and the Urn's for their dependencies.
    So, any opinions? Is it a good idea? Does it solve the problem? Do i have no idea what I'm talking about?
    Last edited: Mar 21, 2017 at 11:21 AM
  2. manu3d

    manu3d Pixel Forge Artisan

    Complicated? Please checkout a commit from mid-2014, just before I started working on Terasology, and take a look at the rendering code. Then we'll talk complicated. And inflexible. And fragile. Functional, definitely, but untouchable. And guess who ended up touching it so that other people can now aspire to improve it? ;)

    But let me stop my back-patting exercise as I'm sure I'll get cramps from it. Let's get to your suggestions instead: I like them. And I like how there's a good sense of overview of what needs to be done. Certainly GSOC project material here. More specifically:
    • The whole target frame is split up into multiple layers, with one layer for each logical partition of the process. So, there would be one layer for basic entities, one for applying textures to entities, one for NUI, etc.
    I'm not sure what you mean with "target frame" exactly, but with a few other interested parties we have been discussing splitting the DAG into layers. It seems one logical way forward if not -the- way forward. The example layers you provide do not seem to relate too much to the current renderer - there's probably some debate to be had there - but the general idea is in line with planned work. That been said, not all work that is planned as made it out of my head yet, so do not fault yourself for not finding a detailed roadmap.
    • Each layer and each Node get ResourceUrn names, namescoped to the module they are in.
    This was also discussed during last year's GSOC. We simply didn't get around doing it for lack of time. One probable tweak: ResourceUrns are probably overkill as we don't need all their features. We might want to switch everything to SimpleURIs, a very similar but slightly lighter class.
    • Each layer also automatically gets two nodes signalling its start and end.
    This is interesting: so far I didn't think about using nodes for this purpose even though I'm using maker tasks in a similar but more fine grained way. The main problem I see with this is that we haven't quite fully defined what the edges of the graph mean, which in turn means we haven't quite fully looked at how the graph looks like. Encapsulating a layer between two nodes is simple and appealing but makes me wonder if will be practically possible. One of the things I've been working on is to refactor the renderer to the point where the relationship between nodes can be clearly visualized. I was going to look at the possibility of this higher abstraction layer only with that "map" in front of me. I'm also weary of imposing too many boundaries on future developers.

    In any case I'd recommend dropping the term layer. It's too ambiguous semantically. It crops up in all sorts of different contexts. We might want to talk in terms of sub-graphs, sub-dags or dag branches - something more related to the fundamentals of the rendering DAG.
    • Each node lists the Nodes they depend on though ResourceUrns.
    Sure, that's one way to do it. I wouldn't mind direct references either though.

    I'm going to split the following point you make in multiple points.
    • Each layer consists of it's own DAG, which specifies its render pipeline.
    The first sentence above is a given with a graph: each portion of a graph tends to be a graph. The second sentence is a bit more puzzling. What do you mean?
    • All the nodes in a layer share a Context and a FBOManager. They can only access data from other layers explicitly, in the interest of stopping all those weird artifacts from non-deterministic ordering in parallel processing.
    This segregation might make debugging easier, but sounds a bit expensive from a GPU-memory perspective. I might be convinced otherwise, but I'd like to discuss this in front of a diagram showing the renderer as it is now (moving beyond the fact it is implemented as a list) and the renderer how it would look like with this segregation in place. I'm also not sure about individual Contexts per layer, but again I can be convinced otherwise. We just need to flesh this idea out a bit more.
    • Each layer outputs to a single buffer. Each Node is invoked with a pointer to this FBO, and it is expected to write to it if it outputs. This helps in writing nodes used in multiple layers, such as tint nodes that could be used in the visual debugger.
    Well, it should be noted that some nodes do not write to the primary gBuffers but to more "technical" or "intermediate" buffers that are then used as input by other nodes. So, this can't be so strict as described. But it is certainly in line with what I had in mind, each render node (vs computing node vs data node) being set to output to an FBO externally, via method call, rather than internally, within the node's source code. So, there are certainly point of contacts there.

    In regard to the visual debugger, the tricky part is to give flexibility to node developers, so that they can output to whatever FBO they need rather than those types provided by Terasology. I guess we could provide a number of default FBOInspector nodes, capable of displaying the type of FBOs we already use, leaving then the possibility to modules to provide additional ones. In fact, ideally, we should probably force a rendering node to provide an FBO inspector node capable of displaying the node's output: the node could then provide a custom one or simply point to one of the Terasology-provided ones.
    • If a Node wants to access an FBO from another layer, it needs to do so explicitly. For example, if a node in layer A wanted to access a resource in layer B:
      1. Block layer A.
      2. Wait for the currently running node in layer B to finish.
      3. Run the node in A with the FBO, block layer B.
      4. Continue as usual.
    • When all the layers are complete, the buffers are combined to the output.
    In the block above you seem to be hinting to some degree of CPU-side parallelism, with different layers potentially being processed at the same time. For what I know of the current (implicit) shape of the DAG, we are dealing with a tall, narrow tree, with relatively few side branches. Eventually we might have more considerable side branches, i.e. courtesy of CCTV monitors, portals, VR. But I'm not sure what you are writing is actually applicable.
    • This also has the happy side effect of making it really easy to render in parallel, at least at the layer level. We could also implement isolated sublayers somehow, to extend this for layers with really heavy rendering.
    As hinted above, right now I do not see how applicable CPU-side parallelism is. And GPU-side of course every rendering takes advantage of the GPU's parallelism via shaders. Can you please elaborate on these aspects? I.e. with some use-cases?

    (modules support)
    • Create a GraphManager class to manage Layers and Nodes, and expose some API functions for adding them.
    Short answer: sure. Longer answer: we'll need to evaluate if a proper, highly featured Manager is warranted or if we'll be dealing with a simple RenderGraph wrapper, or even giving access to -the- RenderGraph instance itself.
    • Each module consists of a set of Layers(if the module wants to create new ones), and a set of Nodes with a desired Urn for itself(which mentions which layer it's part of) and a list of dependencies. Add these to the graph through the API.
    Sure, it's in line with the current plan.
    • I'm not really sure how to inject tasks properly, any suggestions?
    What do you mean with "inject tasks"?
    • The nodes would have to be refractored slightly to allow us to give them the main buffer to work on at runtime, and they would have to specify their own Urn, and the Urn's for their dependencies.
    Can you please clarify what you mean with "allow us to give them the main buffer to work on at runtime"?

    Regarding dependencies: say node Engine:A renders the landscape and node Module:B recolors the output of A. Then comes node OtherModule:C who wants to add a lens flare to it all. You could say that in this circumstance node C is dependent on node B, in the sense that it should be processed after B has been processed, to take advantage of B's output. Except it's not strictly necessary. If node B is not present in the graph, node C could work directly with the output of A. So, what is node C really dependent on? Or rather, more generically, how does a node really know where to place itself? Layers would help, for sure, but within a layer I'm not sure I've heard or come up with a generic solution other than asking the user to use his judgement and do the wiring via a graphical interface.

    To conclude, thank you for the opportunity to extract myself from the nitty gritty work on the DAG and look at the next stage of development of the DAG. I personally consider your proposal more of a DAG 1.0 rather than a 2.0 simply because right now we are at DAG 0.5 really. It's transitioning toward an original vision that is not really that dissimilar from yours, but it's certainly a slow process due to lack of human resources/time. We are certainly hoping that the GSOC this year, like last year, will be able to give a new momentum to the development in this area.

    I look forward to your reply.
  3. vampcat

    vampcat New Member

    Okay, so I finally registered, just to reply to this thread.

    The ideas are damn interesting, but here some few thoughts that crossed my mind.

    - DON'T render with multiple threads. You just don't. The GPU is already using all it's cores and resources when you ask it to render via one thread, asking it to render via multiple will only slow down every single process, and will yield no performance benifit. Plus, there is the problem of contexts and displays being specific only to the current thread in a lot of libraries, along with lack of thread-safe code.. So that won't work. Plus, the headache of ensuring their isolation in edge cases like buffer swaps.. Will it be worth the time we invest in it?

    - I'm not sure why we need separate FBOManagers for each "layer", especially considering the fact that there are different types of FBOs (although we *can* have a dumb FBOManager and smart FBOs.. but that's a different discussion) and so we'll need at least 2 (one resolutionDependent, one resolutionIndependent) FBOManagers for each layer. Why a different context for each layer? I have slightly different concerns about this than manu3d.

    That being said, nice ideas! :D
    Last edited: Mar 20, 2017 at 6:54 PM
  4. Hybrid

    Hybrid New Member

    Well, that's about 50x the reply I was expecting to get. Thanks for the thorough dissection, @manu3d.

    My bad for taking so long, it's just that every time I think I got it down, I'd notice something else that could break.

    This reads like a terrible wiki page. Sorry for that. I'll clean it up as soon as I can.

    And before we begin, can I just mention how wonderful of a learning experience this is proving to be. I don't think I've ever had to write a spec for something with this kind of scope or detail before, and it's pretty fun, if you ignore the incoherent screaming and the sound of heads being banged on walls :)

    Okay, let's start with the easy ones.
    • Okay, so I finally registered, just to reply to this thread.
    You honor me, @vampcat (awesome handle, by the way :) ).
    • DON'T render with multiple threads. You just don't. The GPU is already using all it's cores and resources when you ask it to render via one thread, asking it to render via multiple will only slow down every single process, and will yield no performance benifit. Plus, there is the problem of contexts and displays being specific only to the current thread in a lot of libraries, along with lack of thread-safe code.. So that won't work. Plus, the headache of ensuring their isolation in edge cases like buffer swaps.. Will it be worth the time we invest in it?
    Yeah, that makes sense. I didn't really understand how this worked before. And a fair bit of the reason I designed things the way I did was for dealing with those edge cases, so a lot of that can now change.
    • I'm not sure why we need separate FBOManagers for each "layer", especially considering the fact that there are different types of FBOs (although we *can* have a dumb FBOManager and smart FBOs.. but that's a different discussion) and so we'll need at least 2 (one resolutionDependent, one resolutionIndependent) FBOs for each layer. Why a different context for each layer? I have slightly different concerns about this than manu3d.
    I thought this over, and if we namespace FBO's correctly, we should be fine with just one FBOManager global to the (current) RenderGraph. The separate contexts was for the sake of allowing even more flexibility with Node design, but that's probably a little overboard, for now at least. It was also meant to help in making everything thread-safe, which as you pointed out, we don't really care about.
    • Complicated? Please checkout a commit from mid-2014, just before I started working on Terasology, and take a look at the rendering code. Then we'll talk complicated. And inflexible. And fragile. Functional, definitely, but untouchable. And guess who ended up touching it so that other people can now aspire to improve it? ;)
    Tried going through that code. My brain threatened to melt out my ears 5 minutes in. I have absolutely no idea how you managed to fix that.
    Also, the new render code is surprisingly clear. Still fragile and inflexible, but clear enough that it only took a moderately skilled Java programmer 2-ish hours to understand pretty much the whole thing. You can pat yourself on the back for that :)
    • I'm not sure what you mean with "target frame" exactly
    Here I was basically referring to the final render target. Do note that this could be something completely different from the main screen, e.g rendering CCTV feeds to a monitor texture.
    • This was also discussed during last year's GSOC. We simply didn't get around doing it for lack of time. One probable tweak: ResourceUrns are probably overkill as we don't need all their features. We might want to switch everything to SimpleURIs, a very similar but slightly lighter class.
    Noted.
    • Sure, that's one way to do it. I wouldn't mind direct references either though.
    The main reason I want to do this with URI's is so that Nodes can easily reference other Nodes, particularly standard Nodes defined in the engine, without having to worry about injection and contexts and stuff, something that we don't want module designers to really worry about. Kind of like how DNS simplifies connecting servers together.
    • In the block above you seem to be hinting to some degree of CPU-side parallelism, with different layers potentially being processed at the same time. For what I know of the current (implicit) shape of the DAG, we are dealing with a tall, narrow tree, with relatively few side branches. Eventually we might have more considerable side branches, i.e. courtesy of CCTV monitors, portals, VR. But I'm not sure what you are writing is actually applicable.
    Yeah, let's forget about that. @vampcat has convinced me of the folly of that decision. And even if we wanted to render multiple things(Graphs), we could just take all the graphs and squash(toposort) them into a single tasklist, and run them sequentially.
    • What do you mean with "inject tasks"?
    I was thinking of the specifics of getting the objects from the modules into the RenderGraph's Context, but I don't think we need to worry about that just yet. We could probably use the same NodeBuilder syntax that @manu3d and @tdgunes came up with for injection.
    • Can you please clarify what you mean with "allow us to give them the main buffer to work on at runtime"?
    Okay, let's discuss Nodes.

    The way the system works right now, is that the WorldRenderer builds the Nodes at init, then flattens them and extracts a tasklist from them. If the RenderGraph changes, the tasklist is rebuilt.

    Which is fine, but a bit inflexible.

    I suggest we refractor the Nodes a bit, so that we(the Manager) can decide what their input and output FBO is. This instantly makes the Nodes far more flexible with very little work. Once again, note that this represents the working FBO that the layer is writing to. All nodes have access to other FBO's through the FBOManager. Then, Nodes shouldn't try to operate on things outside their Layers, aside from the FBO they were given when they were run. We'll have special mechanisms for handling things that need to cross Layers.

    Honestly, this is probably something that should be implemented anyway irrespective of this project.

    One caveat of this approach:
    • If a Node is meant to process the current FBO, it has to not try to read it from the FBOManager, and just use the one it was given, as that would defeat the whole point.

    Before I get to the rest, I need to clarify some things.

    First, some motivation. The main goals of this architecture are:
    • Not having 190 lines of code in the initialize function (I counted).
    • Making the API and process as declarative as possible(without becoming JSON or something, we don't need it to be that easy).
    • Creating as many logical divisions between the Nodes as possible(in a way that isn't completely ridiculous). This means grouping Nodes into Modules, and Modules into Layers. (Or should we skip Modules and go directly to Layers? Not so sure about this one.)
    • For example, the current render graph has 4 resampling nodes right after each other. We could group those in a module.
    • Making modules modular. You should be able to connect modules to each other in ways the designer of said module did not expect.
    • Making modules reusable. It should be possible to use a shader module(or a tint module) defined in the engine, to apply a constant effect to any FBO you want with minimal effort.
    • Making debugging much easier. Implementing the render debugger would be almost trivial, if we get this architecture working.
    • Maintaining performance. All this stuff looks expensive, but we can fairly easily resolve all the dependencies and URI's and stuff and squash the whole thing, and then use the resulting list of Nodes to render. Layers also are basically semantic constructs, and have almost zero overhead in normal operation.
    A note on Modules:
    • A module is a logical grouping of Nodes. It has no correlation with the module system that Terasology uses, and I should probably find a better name for them.

    Layers

    When I was talking of layers, I meant these:
    [​IMG]

    Each layer represents a different set of dependent modules to be rendered, and was also kinda inspired by your issue, @manu3d. A layer may be rendered after the previous one, or it may be declared as a sublayer with the help of a special Node. Each Layer passes around a FBO to it's modules, to which they are supposed to draw to. Splitting things in to layers can be extremely arbitrary. A major goal here is dealing with the dependency issue @manu3d mentioned. One possible breakdown of layers(for representative purposes, this is probably terrible for a whole host of reasons):
    • Game UI
    • Stats page(like the one Minecraft has)
    • NUI intefaces
    • Foreground objects
    • Middleground objects
    • Background objects
    • Outputs from other render targets(CCTV monitors, portals, whatever)
    The main motivation for layers is that they are basically sandboxed; they (logically) draw their output to a seperate FBO. The main benefit of this approach is that, assuming the modules that set things like overlays set alpha channels correctly, the debugger becomes a GCI task. Here's what we could do:
    • Move the offending nodes up to a new SubLayer.
    • Tell that Layer to enable sandboxing.
    • Add a tint Node to the end of the Layer.
    • Profit.
    A note on layer sandboxing:
    • The goal is to separate the effect of a layer from that of all the other layers.
    • This can be achieved by the following process:
    • The first Node of the Layer is given a new FBO to write to.
    • That FBO is passed to each Node in the Layer.
    • The final output in that

    Each Node object's URI specifies which Layer it belongs to.
    The NodeManager/WorldRender(or whatever's running this whole show) is responsible for providing the Layer with it's target FBO.

    Dependencies:
    • Regarding dependencies: say node Engine:A renders the landscape and node Module:B recolors the output of A. Then comes node OtherModule:C who wants to add a lens flare to it all. You could say that in this circumstance node C is dependent on node B, in the sense that it should be processed after B has been processed, to take advantage of B's output. Except it's not strictly necessary. If node B is not present in the graph, node C could work directly with the output of A. So, what is node C really dependent on? Or rather, more generically, how does a node really know where to place itself? Layers would help, for sure, but within a layer I'm not sure I've heard or come up with a generic solution other than asking the user to use his judgement and do the wiring via a graphical interface.
    I've been sitting in front of this open tab for a few hours now, trying to work this one out, and I'm more convinced than ever that layers is the right solution. Here's what I've thought of.
    • The complicated way would be to make B depend on A, and create an special extra slot at the end of a layer for aftereffects processing. We could then move all those Nodes there. If multiple Nodes want to do fancy processing on the output FBO, we could clone the FBO, send a copy to each to be processed, and blend/overlay/add/whatever them at the end. This tactic could also be used to solve some general dependency conflicts.
    • The simple way would be to move module C to a higher layer. This specifies that C depends on the Layer containing A and B, not any of the Nodes in the Layer. A bit of refractoring to support this, and problem solved. See the power of layers?
    • And yes, this would probably lead to us resolving load order by creating tons of Layers. That should be okay though, since Layers don't actually add much to overhead, and they completely vanish in the final TaskList.
    To sum up(tl;dr):

    A tentative list of changes:
    • Modify each Node/NodeBuilder/whatever to take input and output at the time of render pipeline init.
    • Create a Module class (maybe builder) to create a sub-dag of Nodes that perform a single logical function.
    • Create a Layout class (maybe builder) to create a dag of Modules that draw to a single FBO(that may or may not be the one everyone else is drawing to).
    • Create a 'smart' Manager(or add the functionality into the WorldRenderImpl) to set this whole thing up, connect the Modules and Layers together, resolve load conflicts by moving things through layers, and squash the whole thing into a TaskList.

    Current concerns:
    • Would generating the tasklist be too expensive to run every time the graph rebuilds? We may need to rebuild only the relevant module, but that would require
    • Is is very difficult, for whatever reason, to insist that the tasks take their FBO input and output targets at runtime? I haven't looked into the code at too much depth, so I'm not sure.
    • We may have too many layers of tangled abstractions. Should we just get rid of the Modules?
    • Is the layer system overkill? There probably aren't going to be many people writing code that directly interferes with the render pipeline, after all. And most of them can just use a NUI overlay or something.
    • Do things that create partial overlays(like NUI windows) use alpha channels correctly to allow things behind them to shine through? If they do, we're all good. Otherwise, I've got no clue how to get this to work.
    • How heavy is the overhead from creating a bunch of extra FBO's and then overlaying them on each other? If it's within 2-3x the optimized pipeline, it should be fine for render debugging.
    • Designing the code for building thr pipeline is probably gonna feel like writing a compiler.
      • First we load the Modules, who also load their Nodes and Tasks.
      • Then we construct a graph in-memory with all the modules by resolving the URI's.
      • We resolve conflicts by moving things though Layers.
      • Then we squash the whole thing into a TaskList.
    • I'm not sure how to move, create, and link Layers yet. Do we allow Layers to basically be treated as subroutines embedded within other Layers? How do we deal with Layer dependencies? Do Layers even have dependencies? (Yeah, this is as far as I've thought right now.)

    I'm no architect, and this architecture is probably gonna need a lot more revisions to get into shape. Any suggestions at all would be immensely helpful.

    And now I'm gonna go find a bucket of ice to try to cool down my overheated brain.
  5. manu3d

    manu3d Pixel Forge Artisan

    Hi @Hybrid and thank you for your detailed answer.

    I think our thinking is aligned on most things and similarly puzzled on a number of things that need to be better thought through. I don't have the time today unfortunately to respond in depth. Also tomorrow will be difficult. I should be able to respond on Wednsday night, but overall you are on a very good track.

    One request for today: please edit your message to remove all references to the word "Module" and replace it with something else. Terasology uses the word Module to describe what other softwares call "plugin" or "extension". A module is -the- way Terasology is extended. A module or a set of interdependent modules can provide anything from a completely new gaming experience to different world generators, from new textures to new sounds. One day the whole renderer could be an external module. Or nodes to be added to the renderer's DAG could be provided and inserted by an external module. So, in vacuum your use of the word "module" would make sense. In the context of Terasology it overlaps with a very specific concept and creates ambiguity.

    In your post, if I am understanding you correctly, you use the term "module" to describe a group of nodes that are closely related to each other. Perhaps even atomically related to each other (shouldn't be separated). I'd therefore suggest you look for some terms reflecting this. NodeGroups and AtomicGroups perhaps. In this context it is appropriate to wonder if the "layer" abstraction is necessary. Perhaps it is and perhaps it can be implemented via NodeGroups rather than with special handling.

    I must stop here though. Much to do for the next couple of days. Stand-by for more.

    Meanwhile you might want to start drafting your proposal(s) in a Google Docs, and if you want we can continue this discussion right on the proposal.

    In any case thank you for your interest in this topic. This is a very productive conversation we are having.
  6. Hybrid

    Hybrid New Member

    Yeah, I did think that I would need a better name for Modules. Ontological conflicts are no fun.
    Modules are supposed to represent Nodes with strong dependencies, which means they must run in that order and must not be seperated. AtomicGroup kinda suggests that the whole group is atomic, which isn't how it's supposed to work.
    I'll think of something.
    I've started working on the proposal in a Docs file, I'll link here when it's somewhat presentable.

    Also, I plan to clean up the render code a bit over the next few days, are there any parts of the code that are particularly messy?

    Thanks again for your time @manu3d, I wouldn't have figured out a tenth of that without your input. :D
    Last edited: Mar 21, 2017 at 11:24 AM

Share This Page