Another render architecture overhaul

Discussion in 'Developer Portal' started by Hybrid, Mar 19, 2017.

  1. Hybrid

    Hybrid New Member

    The current DAG system is pretty good, but it's also kinda complicated. I'd like to suggest a slightly more abstract architecture for the renderer here.
    The core ideas here:
    • The whole target frame is split up into multiple layers, with one layer for each logical partition of the process. So, there would be one layer for basic entities, one for applying textures to entities, one for NUI, etc.
    • Each layer and each Node get ResourceUrn names, namescoped to the module they are in.
    • Each layer also automatically gets two nodes signalling its start and end.
    • Each node lists the Nodes they depend on though ResourceUrns.
    • Each layer consists of it's own DAG, which specifies its render pipeline. All the nodes in a layer share a Context and a FBOManager. They can only access data from other layers explicitly, in the interest of stopping all those weird artifacts from non-deterministic ordering in parallel processing.
    • Each layer outputs to a single buffer. Each Node is invoked with a pointer to this FBO, and it is expected to write to it if it outputs. This helps in writing nodes used in multiple layers, such as tint nodes that could be used in the visual debugger.
    • If a Node wants to access an FBO from another layer, it needs to do so explicitly. For example, if a node in layer A wanted to access a resource in layer B:
      1. Block layer A.
      2. Wait for the currently running node in layer B to finish.
      3. Run the node in A with the FBO, block layer B.
      4. Continue as usual.
      Nevermind all that. We don't need it, at least nnot yet.

    • When all the layers are complete, the buffers are combined to the output.
    This also has the happy side effect of making it really easy to render in parallel, at least at the layer level. We could also implement isolated sublayers somehow, to extend this for layers with really heavy rendering.

    We also want to support modules, and here's one way we could do it:
    • Create a GraphManager class to manage Layers and Nodes, and expose some API functions for adding them.
    • Each module consists of a set of Layers(if the module wants to create new ones), and a set of Nodes with a desired Urn for itself(which mentions which layer it's part of) and a list of dependencies. Add these to the graph through the API.
    • I'm not really sure how to inject tasks properly, any suggestions?
    Changes to current codebase:
    • The nodes would have to be refractored slightly to allow us to give them the main buffer to work on at runtime, and they would have to specify their own Urn, and the Urn's for their dependencies.
    So, any opinions? Is it a good idea? Does it solve the problem? Do i have no idea what I'm talking about?
    Last edited: Mar 21, 2017
  2. manu3d

    manu3d Pixel Forge Artisan

    Complicated? Please checkout a commit from mid-2014, just before I started working on Terasology, and take a look at the rendering code. Then we'll talk complicated. And inflexible. And fragile. Functional, definitely, but untouchable. And guess who ended up touching it so that other people can now aspire to improve it? ;)

    But let me stop my back-patting exercise as I'm sure I'll get cramps from it. Let's get to your suggestions instead: I like them. And I like how there's a good sense of overview of what needs to be done. Certainly GSOC project material here. More specifically:
    • The whole target frame is split up into multiple layers, with one layer for each logical partition of the process. So, there would be one layer for basic entities, one for applying textures to entities, one for NUI, etc.
    I'm not sure what you mean with "target frame" exactly, but with a few other interested parties we have been discussing splitting the DAG into layers. It seems one logical way forward if not -the- way forward. The example layers you provide do not seem to relate too much to the current renderer - there's probably some debate to be had there - but the general idea is in line with planned work. That been said, not all work that is planned as made it out of my head yet, so do not fault yourself for not finding a detailed roadmap.
    • Each layer and each Node get ResourceUrn names, namescoped to the module they are in.
    This was also discussed during last year's GSOC. We simply didn't get around doing it for lack of time. One probable tweak: ResourceUrns are probably overkill as we don't need all their features. We might want to switch everything to SimpleURIs, a very similar but slightly lighter class.
    • Each layer also automatically gets two nodes signalling its start and end.
    This is interesting: so far I didn't think about using nodes for this purpose even though I'm using maker tasks in a similar but more fine grained way. The main problem I see with this is that we haven't quite fully defined what the edges of the graph mean, which in turn means we haven't quite fully looked at how the graph looks like. Encapsulating a layer between two nodes is simple and appealing but makes me wonder if will be practically possible. One of the things I've been working on is to refactor the renderer to the point where the relationship between nodes can be clearly visualized. I was going to look at the possibility of this higher abstraction layer only with that "map" in front of me. I'm also weary of imposing too many boundaries on future developers.

    In any case I'd recommend dropping the term layer. It's too ambiguous semantically. It crops up in all sorts of different contexts. We might want to talk in terms of sub-graphs, sub-dags or dag branches - something more related to the fundamentals of the rendering DAG.
    • Each node lists the Nodes they depend on though ResourceUrns.
    Sure, that's one way to do it. I wouldn't mind direct references either though.

    I'm going to split the following point you make in multiple points.
    • Each layer consists of it's own DAG, which specifies its render pipeline.
    The first sentence above is a given with a graph: each portion of a graph tends to be a graph. The second sentence is a bit more puzzling. What do you mean?
    • All the nodes in a layer share a Context and a FBOManager. They can only access data from other layers explicitly, in the interest of stopping all those weird artifacts from non-deterministic ordering in parallel processing.
    This segregation might make debugging easier, but sounds a bit expensive from a GPU-memory perspective. I might be convinced otherwise, but I'd like to discuss this in front of a diagram showing the renderer as it is now (moving beyond the fact it is implemented as a list) and the renderer how it would look like with this segregation in place. I'm also not sure about individual Contexts per layer, but again I can be convinced otherwise. We just need to flesh this idea out a bit more.
    • Each layer outputs to a single buffer. Each Node is invoked with a pointer to this FBO, and it is expected to write to it if it outputs. This helps in writing nodes used in multiple layers, such as tint nodes that could be used in the visual debugger.
    Well, it should be noted that some nodes do not write to the primary gBuffers but to more "technical" or "intermediate" buffers that are then used as input by other nodes. So, this can't be so strict as described. But it is certainly in line with what I had in mind, each render node (vs computing node vs data node) being set to output to an FBO externally, via method call, rather than internally, within the node's source code. So, there are certainly point of contacts there.

    In regard to the visual debugger, the tricky part is to give flexibility to node developers, so that they can output to whatever FBO they need rather than those types provided by Terasology. I guess we could provide a number of default FBOInspector nodes, capable of displaying the type of FBOs we already use, leaving then the possibility to modules to provide additional ones. In fact, ideally, we should probably force a rendering node to provide an FBO inspector node capable of displaying the node's output: the node could then provide a custom one or simply point to one of the Terasology-provided ones.
    • If a Node wants to access an FBO from another layer, it needs to do so explicitly. For example, if a node in layer A wanted to access a resource in layer B:
      1. Block layer A.
      2. Wait for the currently running node in layer B to finish.
      3. Run the node in A with the FBO, block layer B.
      4. Continue as usual.
    • When all the layers are complete, the buffers are combined to the output.
    In the block above you seem to be hinting to some degree of CPU-side parallelism, with different layers potentially being processed at the same time. For what I know of the current (implicit) shape of the DAG, we are dealing with a tall, narrow tree, with relatively few side branches. Eventually we might have more considerable side branches, i.e. courtesy of CCTV monitors, portals, VR. But I'm not sure what you are writing is actually applicable.
    • This also has the happy side effect of making it really easy to render in parallel, at least at the layer level. We could also implement isolated sublayers somehow, to extend this for layers with really heavy rendering.
    As hinted above, right now I do not see how applicable CPU-side parallelism is. And GPU-side of course every rendering takes advantage of the GPU's parallelism via shaders. Can you please elaborate on these aspects? I.e. with some use-cases?

    (modules support)
    • Create a GraphManager class to manage Layers and Nodes, and expose some API functions for adding them.
    Short answer: sure. Longer answer: we'll need to evaluate if a proper, highly featured Manager is warranted or if we'll be dealing with a simple RenderGraph wrapper, or even giving access to -the- RenderGraph instance itself.
    • Each module consists of a set of Layers(if the module wants to create new ones), and a set of Nodes with a desired Urn for itself(which mentions which layer it's part of) and a list of dependencies. Add these to the graph through the API.
    Sure, it's in line with the current plan.
    • I'm not really sure how to inject tasks properly, any suggestions?
    What do you mean with "inject tasks"?
    • The nodes would have to be refractored slightly to allow us to give them the main buffer to work on at runtime, and they would have to specify their own Urn, and the Urn's for their dependencies.
    Can you please clarify what you mean with "allow us to give them the main buffer to work on at runtime"?

    Regarding dependencies: say node Engine:A renders the landscape and node Module:B recolors the output of A. Then comes node OtherModule:C who wants to add a lens flare to it all. You could say that in this circumstance node C is dependent on node B, in the sense that it should be processed after B has been processed, to take advantage of B's output. Except it's not strictly necessary. If node B is not present in the graph, node C could work directly with the output of A. So, what is node C really dependent on? Or rather, more generically, how does a node really know where to place itself? Layers would help, for sure, but within a layer I'm not sure I've heard or come up with a generic solution other than asking the user to use his judgement and do the wiring via a graphical interface.

    To conclude, thank you for the opportunity to extract myself from the nitty gritty work on the DAG and look at the next stage of development of the DAG. I personally consider your proposal more of a DAG 1.0 rather than a 2.0 simply because right now we are at DAG 0.5 really. It's transitioning toward an original vision that is not really that dissimilar from yours, but it's certainly a slow process due to lack of human resources/time. We are certainly hoping that the GSOC this year, like last year, will be able to give a new momentum to the development in this area.

    I look forward to your reply.
  3. vampcat

    vampcat New Member

    Okay, so I finally registered, just to reply to this thread.

    The ideas are damn interesting, but here some few thoughts that crossed my mind.

    - DON'T render with multiple threads. You just don't. The GPU is already using all it's cores and resources when you ask it to render via one thread, asking it to render via multiple will only slow down every single process, and will yield no performance benifit. Plus, there is the problem of contexts and displays being specific only to the current thread in a lot of libraries, along with lack of thread-safe code.. So that won't work. Plus, the headache of ensuring their isolation in edge cases like buffer swaps.. Will it be worth the time we invest in it?

    - I'm not sure why we need separate FBOManagers for each "layer", especially considering the fact that there are different types of FBOs (although we *can* have a dumb FBOManager and smart FBOs.. but that's a different discussion) and so we'll need at least 2 (one resolutionDependent, one resolutionIndependent) FBOManagers for each layer. Why a different context for each layer? I have slightly different concerns about this than manu3d.

    That being said, nice ideas! :D
    Last edited: Mar 20, 2017
  4. Hybrid

    Hybrid New Member

    Well, that's about 50x the reply I was expecting to get. Thanks for the thorough dissection, @manu3d.

    My bad for taking so long, it's just that every time I think I got it down, I'd notice something else that could break.

    This reads like a terrible wiki page. Sorry for that. I'll clean it up as soon as I can.

    And before we begin, can I just mention how wonderful of a learning experience this is proving to be. I don't think I've ever had to write a spec for something with this kind of scope or detail before, and it's pretty fun, if you ignore the incoherent screaming and the sound of heads being banged on walls :)

    Okay, let's start with the easy ones.
    • Okay, so I finally registered, just to reply to this thread.
    You honor me, @vampcat (awesome handle, by the way :) ).
    • DON'T render with multiple threads. You just don't. The GPU is already using all it's cores and resources when you ask it to render via one thread, asking it to render via multiple will only slow down every single process, and will yield no performance benifit. Plus, there is the problem of contexts and displays being specific only to the current thread in a lot of libraries, along with lack of thread-safe code.. So that won't work. Plus, the headache of ensuring their isolation in edge cases like buffer swaps.. Will it be worth the time we invest in it?
    Yeah, that makes sense. I didn't really understand how this worked before. And a fair bit of the reason I designed things the way I did was for dealing with those edge cases, so a lot of that can now change.
    • I'm not sure why we need separate FBOManagers for each "layer", especially considering the fact that there are different types of FBOs (although we *can* have a dumb FBOManager and smart FBOs.. but that's a different discussion) and so we'll need at least 2 (one resolutionDependent, one resolutionIndependent) FBOs for each layer. Why a different context for each layer? I have slightly different concerns about this than manu3d.
    I thought this over, and if we namespace FBO's correctly, we should be fine with just one FBOManager global to the (current) RenderGraph. The separate contexts was for the sake of allowing even more flexibility with Node design, but that's probably a little overboard, for now at least. It was also meant to help in making everything thread-safe, which as you pointed out, we don't really care about.
    • Complicated? Please checkout a commit from mid-2014, just before I started working on Terasology, and take a look at the rendering code. Then we'll talk complicated. And inflexible. And fragile. Functional, definitely, but untouchable. And guess who ended up touching it so that other people can now aspire to improve it? ;)
    Tried going through that code. My brain threatened to melt out my ears 5 minutes in. I have absolutely no idea how you managed to fix that.
    Also, the new render code is surprisingly clear. Still fragile and inflexible, but clear enough that it only took a moderately skilled Java programmer 2-ish hours to understand pretty much the whole thing. You can pat yourself on the back for that :)
    • I'm not sure what you mean with "target frame" exactly
    Here I was basically referring to the final render target. Do note that this could be something completely different from the main screen, e.g rendering CCTV feeds to a monitor texture.
    • This was also discussed during last year's GSOC. We simply didn't get around doing it for lack of time. One probable tweak: ResourceUrns are probably overkill as we don't need all their features. We might want to switch everything to SimpleURIs, a very similar but slightly lighter class.
    Noted.
    • Sure, that's one way to do it. I wouldn't mind direct references either though.
    The main reason I want to do this with URI's is so that Nodes can easily reference other Nodes, particularly standard Nodes defined in the engine, without having to worry about injection and contexts and stuff, something that we don't want module designers to really worry about. Kind of like how DNS simplifies connecting servers together.
    • In the block above you seem to be hinting to some degree of CPU-side parallelism, with different layers potentially being processed at the same time. For what I know of the current (implicit) shape of the DAG, we are dealing with a tall, narrow tree, with relatively few side branches. Eventually we might have more considerable side branches, i.e. courtesy of CCTV monitors, portals, VR. But I'm not sure what you are writing is actually applicable.
    Yeah, let's forget about that. @vampcat has convinced me of the folly of that decision. And even if we wanted to render multiple things(Graphs), we could just take all the graphs and squash(toposort) them into a single tasklist, and run them sequentially.
    • What do you mean with "inject tasks"?
    I was thinking of the specifics of getting the objects from the modules into the RenderGraph's Context, but I don't think we need to worry about that just yet. We could probably use the same NodeBuilder syntax that @manu3d and @tdgunes came up with for injection.
    • Can you please clarify what you mean with "allow us to give them the main buffer to work on at runtime"?
    Okay, let's discuss Nodes.

    The way the system works right now, is that the WorldRenderer builds the Nodes at init, then flattens them and extracts a tasklist from them. If the RenderGraph changes, the tasklist is rebuilt.

    Which is fine, but a bit inflexible.

    I suggest we refractor the Nodes a bit, so that we(the Manager) can decide what their input and output FBO is. This instantly makes the Nodes far more flexible with very little work. Once again, note that this represents the working FBO that the layer is writing to. All nodes have access to other FBO's through the FBOManager. Then, Nodes shouldn't try to operate on things outside their Layers, aside from the FBO they were given when they were run. We'll have special mechanisms for handling things that need to cross Layers.

    Honestly, this is probably something that should be implemented anyway irrespective of this project.

    One caveat of this approach:
    • If a Node is meant to process the current FBO, it has to not try to read it from the FBOManager, and just use the one it was given, as that would defeat the whole point.

    Before I get to the rest, I need to clarify some things.

    First, some motivation. The main goals of this architecture are:
    • Not having 190 lines of code in the initialize function (I counted).
    • Making the API and process as declarative as possible(without becoming JSON or something, we don't need it to be that easy).
    • Creating as many logical divisions between the Nodes as possible(in a way that isn't completely ridiculous). This means grouping Nodes into Modules, and Modules into Layers. (Or should we skip Modules and go directly to Layers? Not so sure about this one.)
    • For example, the current render graph has 4 resampling nodes right after each other. We could group those in a module.
    • Making modules modular. You should be able to connect modules to each other in ways the designer of said module did not expect.
    • Making modules reusable. It should be possible to use a shader module(or a tint module) defined in the engine, to apply a constant effect to any FBO you want with minimal effort.
    • Making debugging much easier. Implementing the render debugger would be almost trivial, if we get this architecture working.
    • Maintaining performance. All this stuff looks expensive, but we can fairly easily resolve all the dependencies and URI's and stuff and squash the whole thing, and then use the resulting list of Nodes to render. Layers also are basically semantic constructs, and have almost zero overhead in normal operation.
    A note on Modules:
    • A module is a logical grouping of Nodes. It has no correlation with the module system that Terasology uses, and I should probably find a better name for them.

    Layers

    When I was talking of layers, I meant these:
    [​IMG]

    Each layer represents a different set of dependent modules to be rendered, and was also kinda inspired by your issue, @manu3d. A layer may be rendered after the previous one, or it may be declared as a sublayer with the help of a special Node. Each Layer passes around a FBO to it's modules, to which they are supposed to draw to. Splitting things in to layers can be extremely arbitrary. A major goal here is dealing with the dependency issue @manu3d mentioned. One possible breakdown of layers(for representative purposes, this is probably terrible for a whole host of reasons):
    • Game UI
    • Stats page(like the one Minecraft has)
    • NUI intefaces
    • Foreground objects
    • Middleground objects
    • Background objects
    • Outputs from other render targets(CCTV monitors, portals, whatever)
    The main motivation for layers is that they are basically sandboxed; they (logically) draw their output to a seperate FBO. The main benefit of this approach is that, assuming the modules that set things like overlays set alpha channels correctly, the debugger becomes a GCI task. Here's what we could do:
    • Move the offending nodes up to a new SubLayer.
    • Tell that Layer to enable sandboxing.
    • Add a tint Node to the end of the Layer.
    • Profit.
    A note on layer sandboxing:
    • The goal is to separate the effect of a layer from that of all the other layers.
    • This can be achieved by the following process:
    • The first Node of the Layer is given a new FBO to write to.
    • That FBO is passed to each Node in the Layer.
    • The final output in that

    Each Node object's URI specifies which Layer it belongs to.
    The NodeManager/WorldRender(or whatever's running this whole show) is responsible for providing the Layer with it's target FBO.

    Dependencies:
    • Regarding dependencies: say node Engine:A renders the landscape and node Module:B recolors the output of A. Then comes node OtherModule:C who wants to add a lens flare to it all. You could say that in this circumstance node C is dependent on node B, in the sense that it should be processed after B has been processed, to take advantage of B's output. Except it's not strictly necessary. If node B is not present in the graph, node C could work directly with the output of A. So, what is node C really dependent on? Or rather, more generically, how does a node really know where to place itself? Layers would help, for sure, but within a layer I'm not sure I've heard or come up with a generic solution other than asking the user to use his judgement and do the wiring via a graphical interface.
    I've been sitting in front of this open tab for a few hours now, trying to work this one out, and I'm more convinced than ever that layers is the right solution. Here's what I've thought of.
    • The complicated way would be to make B depend on A, and create an special extra slot at the end of a layer for aftereffects processing. We could then move all those Nodes there. If multiple Nodes want to do fancy processing on the output FBO, we could clone the FBO, send a copy to each to be processed, and blend/overlay/add/whatever them at the end. This tactic could also be used to solve some general dependency conflicts.
    • The simple way would be to move module C to a higher layer. This specifies that C depends on the Layer containing A and B, not any of the Nodes in the Layer. A bit of refractoring to support this, and problem solved. See the power of layers?
    • And yes, this would probably lead to us resolving load order by creating tons of Layers. That should be okay though, since Layers don't actually add much to overhead, and they completely vanish in the final TaskList.
    To sum up(tl;dr):

    A tentative list of changes:
    • Modify each Node/NodeBuilder/whatever to take input and output at the time of render pipeline init.
    • Create a Module class (maybe builder) to create a sub-dag of Nodes that perform a single logical function.
    • Create a Layout class (maybe builder) to create a dag of Modules that draw to a single FBO(that may or may not be the one everyone else is drawing to).
    • Create a 'smart' Manager(or add the functionality into the WorldRenderImpl) to set this whole thing up, connect the Modules and Layers together, resolve load conflicts by moving things through layers, and squash the whole thing into a TaskList.

    Current concerns:
    • Would generating the tasklist be too expensive to run every time the graph rebuilds? We may need to rebuild only the relevant module, but that would require
    • Is is very difficult, for whatever reason, to insist that the tasks take their FBO input and output targets at runtime? I haven't looked into the code at too much depth, so I'm not sure.
    • We may have too many layers of tangled abstractions. Should we just get rid of the Modules?
    • Is the layer system overkill? There probably aren't going to be many people writing code that directly interferes with the render pipeline, after all. And most of them can just use a NUI overlay or something.
    • Do things that create partial overlays(like NUI windows) use alpha channels correctly to allow things behind them to shine through? If they do, we're all good. Otherwise, I've got no clue how to get this to work.
    • How heavy is the overhead from creating a bunch of extra FBO's and then overlaying them on each other? If it's within 2-3x the optimized pipeline, it should be fine for render debugging.
    • Designing the code for building thr pipeline is probably gonna feel like writing a compiler.
      • First we load the Modules, who also load their Nodes and Tasks.
      • Then we construct a graph in-memory with all the modules by resolving the URI's.
      • We resolve conflicts by moving things though Layers.
      • Then we squash the whole thing into a TaskList.
    • I'm not sure how to move, create, and link Layers yet. Do we allow Layers to basically be treated as subroutines embedded within other Layers? How do we deal with Layer dependencies? Do Layers even have dependencies? (Yeah, this is as far as I've thought right now.)

    I'm no architect, and this architecture is probably gonna need a lot more revisions to get into shape. Any suggestions at all would be immensely helpful.

    And now I'm gonna go find a bucket of ice to try to cool down my overheated brain.
  5. manu3d

    manu3d Pixel Forge Artisan

    Hi @Hybrid and thank you for your detailed answer.

    I think our thinking is aligned on most things and similarly puzzled on a number of things that need to be better thought through. I don't have the time today unfortunately to respond in depth. Also tomorrow will be difficult. I should be able to respond on Wednsday night, but overall you are on a very good track.

    One request for today: please edit your message to remove all references to the word "Module" and replace it with something else. Terasology uses the word Module to describe what other softwares call "plugin" or "extension". A module is -the- way Terasology is extended. A module or a set of interdependent modules can provide anything from a completely new gaming experience to different world generators, from new textures to new sounds. One day the whole renderer could be an external module. Or nodes to be added to the renderer's DAG could be provided and inserted by an external module. So, in vacuum your use of the word "module" would make sense. In the context of Terasology it overlaps with a very specific concept and creates ambiguity.

    In your post, if I am understanding you correctly, you use the term "module" to describe a group of nodes that are closely related to each other. Perhaps even atomically related to each other (shouldn't be separated). I'd therefore suggest you look for some terms reflecting this. NodeGroups and AtomicGroups perhaps. In this context it is appropriate to wonder if the "layer" abstraction is necessary. Perhaps it is and perhaps it can be implemented via NodeGroups rather than with special handling.

    I must stop here though. Much to do for the next couple of days. Stand-by for more.

    Meanwhile you might want to start drafting your proposal(s) in a Google Docs, and if you want we can continue this discussion right on the proposal.

    In any case thank you for your interest in this topic. This is a very productive conversation we are having.
  6. Hybrid

    Hybrid New Member

    Yeah, I did think that I would need a better name for Modules. Ontological conflicts are no fun.
    Modules are supposed to represent Nodes with strong dependencies, which means they must run in that order and must not be seperated. AtomicGroup kinda suggests that the whole group is atomic, which isn't how it's supposed to work.
    I'll think of something.
    I've started working on the proposal in a Docs file, I'll link here when it's somewhat presentable.

    Also, I plan to clean up the render code a bit over the next few days, are there any parts of the code that are particularly messy?

    Thanks again for your time @manu3d, I wouldn't have figured out a tenth of that without your input. :D
    • Like Like x 1
    Last edited: Mar 21, 2017
  7. Hybrid

    Hybrid New Member

    Okay, here's my draft: https://docs.google.com/document/d/1izXnrN5S5vudgPabKB81xYAC41ptY6uCVmA7xMAQvaM/edit?ts=58d38764#

    Here's the biggest things I'm trying to work out right now:
    1. A better name for `Layer`.
    2. How do you allow Nodes to define interfaces to receive FBO's through, in a way that's fairly resistant to nooby wiring?
    3. How do you define attachment points for Layers?

    Let me know if you spot anything.

    Also, @vampcat and I have been bouncing ideas back and forth, so if you see a lot of parallels there, that's why.
  8. manu3d

    manu3d Pixel Forge Artisan

    I'm glad you are liking it so far. And yes, we will ignore the screaming. :laugh:


    Thank you. And I agree with your assessment regarding fragility and inflexiblity. My very first goal was to structure it so that people could finally understand at the last the broad strokes of what it does. This more or less happened between my work pre-GSOC2016, tdgunes work and post-GSOC2016 work. Now we have to finish the transition of the high-level architecture toward a DAG-based one, cleaning up what we can in the process (but without this being a priority) and eventually opening the DAG up for module developers. After that the field will become wide open for proper improvements, i.e. new visual effects, alternative rendering styles, switching to higher opengl releases and sorts of other cool stuff. Above all the goal is that it won't be necessary to change the architecture to change the renderer. Just modify/replace a set of nodes, even all of them.

    I guess here our visions differ a bit because I see only one final render target for the time being. CCTV feeds are ultimately shown within that same image, so I don't consider them separate: behind those cameras there usually would be a smaller/simpler DAG that is part of the main DAG. For me multiple render targets really come into place when we deal with multiple monitor/display, i.e. when you have a VR HMD -and- the same image shown on screen.

    My concern in this context is performance. Ideally the tasklist should be refreshed quite quickly, i.e. for transient effects. For that reason I think the indirection of URIs to connect Nodes to other Nodes feels unhelpful. But I'd certainly use URIs to identify Nodes, NodeGroups and input/output sockets - which is the term I'd recommend instead of ports. In this context I'm not sure where your worries about injection/contexts fit in.

    If I look at how the renderer as it is now I agree, especially with the statement about deciding their input/output FBOs. I guess I'd use the term "declaring" rather than deciding. I don't agree so far with the one FBO per layer idea, but we have discussed this in chat and we both know is still an issue for debate.

    From my perspective having NodeGroups grouping nodes and potentially other NodeGroups within them, should be sufficient as it would leave plenty to work with for external developers.

    It's an interesting concept but a bit vague. In the sense: sure, why not, but how would that work?
    On layers, I'd recommend for them to be implemented as NodeGroups potentially containing other NodeGroups. But I wouldn't dictate what the layers are. Of course as the renderer would start taking advantage of the layers we'd group nodes into them. But I wouldn't want any layer to be nailed to the renderer and impossible to remove.

    While I agree layers would be good, even if they implemented as standard NodeGroups with some kind of "<name>Layer" URI, I'm not sure debugging a node will ever be that simple. My strategy in this context would be to move the FinalOutputNode at runtime and connect it to a node to see its output, probably using an intermediate bridge node that converts the content of a given FBO into something that can be displayed on screen, and perhaps additional nodes for tinting/highlighting a node's contribution. But I'm not sure I'd want to move a node out of its context. I suspect in some cases it wouldn't be meaningful/desirable to do so. Moving a node elsewhere in the graph would also limit the possibility of toggling quickly between debug mode and normal rendering.

    I'd prefer to see something like NodeGroup.add(NodeURI) method to add a node to a nodegroup, with NodeGroup inheriting/implementing some kind of Graph interface/functionality.

    Would require... ?

    I don't think it would be overly difficult on the output FBOs side as currently only one FBO is allowed to write on at any given time. It's a bit trickier on the input side as inputs are not really FBOs but FBO attachments and other texture buffers not attached to FBOs. And there can be more than one per node.

    In line with previous discussions I'd recommend NodeGroup that can contain nodes and other NodeGroups. If hierarchies deeper than 1 level are needed, that should solve the issue.

    I don't think so. Grouping nodes together is important even to just talk about them, i.e. these are pre-production nodes, these are nodes similar to shooting on set and these are post-production nodes.

    Let's not worry about what happens on top of the rendering for the time being. With the architecture we are discussing it becomes possible for the UI to become a customer of the DAG, true. But I'd leave it to the UI people to decide what to do when the possibility has actually materialized.

    I'd recommend you think about how engine and external developers would interact with the code in this context. But if you follow my advice of thinking about NodeGroups as Nodes themselves, it might be easier.

    This is an excellent discussion. Thank you for it!
  9. Hybrid

    Hybrid New Member

    This seems to be a minor point of contention for us, but I don't think it's very relevant right now. I'll mark this as a stretch goal.

    Even if we assume that we have 500 nodes, which is already way too much, with three connections each, the process of resolving the URI's would take 1500 string->reference lookups, The impact of that should be negligible.

    And if we needed to quickly switch between multiple render paths, we could just compile each beforehand, and that handles the whole issue. Or do something more complicated like compiling each NodeGroup seperately, and then linking them together later(sockets could be useful here).


    Firstly, layers are dead. NodeGroups can handle those roles.

    This ties into the data-flow graph architecture, which we've already discussed. More details below.

    That would mean that the Node doesn't know which NodeGroup it's in, which is probably a better way of doing it. Though we may want to allow random access to the Nodes within the NodeGroup as well.

    That works. We do need to think more about properly encapsulating NodeGroups though, and about making Nodes and NodeGroups as similar as possible, both implementing the same interface if possible.

    Note: maybe RenderGraph and NodeGroup can implement the same interface as well?

    I'll see if I can get some code examples for this. Things should be clearer then.

    Some points of contention:

    The big one here is the actual purpose of the graph. Specifically, as a scheduler vs. as a processor. The main difference between the two is how they think of the links between Nodes.

    As you seem to think of it now:
    - FBO's and texture assets are Nodes of their own.
    - The graph is basically a scheduler.
    - Each Node already knows which FBO it wants to draw to.
    - Edges represent dependencies of some form.

    This works, but it also requires each Node to know which FBO it is going to write to and read from in advance(which can still be given at runtime). This turns the inputs and outputs into configuration parameters, which is fine, albeit a little less intuitive.

    This is how a data-flow graph would work:
    - Each Node declares input and output sockets/ports(think IC pin map).
    - Each Node processes it's inputs, and produces some outputs.
    - Each edge represents an FBO/texture being passed to the next Node.

    The main benefit of this is that it's very easy to explain and understand. Here's an example:
    Imagine the process of producing anime(or a cartoon, if you're not that kinda guy):
    - An artist draws some concept art.
    - The storyboard artist draws keyframes using the art as reference.
    - An artist builds a 3d model of the scenes.
    - Another artist traces out the render.
    - Another artist does basic paint work.
    - Another artist does the finer details.

    One easy way to conceptualize this is that:
    - for the dataflow graph: each person draws their part, and then passes the drawing pad to the next person.
    - for the scheduler graph: Each person sits in a line, then gets up one at a time and goes to a central workstation to do their job, then sends the next one along.

    Ultimately, I don't think it matters too much which one we pick. The data-flow graph is more intuitive and more powerful, but both approaches are valid. There are also a few technical challenges that may be deal-breakers here.

    Major caveats/downsides:
    - FBO's are all in various non-standard sizes, so making a Node fully agnostic of what it's drawing to could be tricky, which may nullify quite a few benefits of the whole architecture.
    - The whole TaskList is, by nature, static, while the data-flow graph is dynamic. This means the compiler would have to be a fair bit more complicated. Not that much, just enough to automatically resolve implicit data-flows to actual FBO objects.

    I will need to change a lot of things in my proposal if we don't use a data-flow graph though, so a quick decision here would be much appreciated.

    As a stretch goal, I'm thinking of trying to normalize those FBO's to have one of a set standard configurations, to allow some flexibility without wasting too much space. This really is a critical thing to do eventually, if not asap, or someone will get the size of the FBO wrong when writing their Node, things will go haywire and weird artifacts will show up, people will scream in inarticulate rage at my incompetence, and our abstractions will leak all over the place.

    As another stretch goal, I'd like to see how to wire Nodes into FlexibleConfig, and, if possible, figure out a way to automatically publish certain settings to the in-game settings menu.
    (This may just end up being a whole project of its own).
  10. manu3d

    manu3d Pixel Forge Artisan

    Thank you @Hybrid for your reply. As I did in the previous email (albeit I didn't explain it) I will omit items on which are aligned or almost completely aligned. In here I'll only reply to items on which I have doubts or on which it's worth to make a remark.

    Ok, let's hope so.

    Please let's talk in terms of generating the task list rather than compiling. Regarding multiple render-paths, it's an interesting thought of producing multiple task-lists that can be easily swapped. Let's keep this as a low priority item though.

    I don't know why you came to that conclusion. Of course we are getting in pretty deep implementation details and perhaps that's premature. But I'd suggest that a Node should know about the NodeGroup it's in, i.e. to be able to plug itself into the sockets of the NodeGroup and to find other nodes within it.


    Yes, I'd recommend that Nodes and NodeGroup implement a common interface, with NodeGroup having additional methods to access the contained nodes. RenderGraph implementing the same interface as NodeGroup? I'd say yes, as far as graph operations are concerned, because a NodeGroup is effectively a mini-graph.

    Yes please. And keep ease of use in focus please!

    First of all thank you for digging down into the issue and further describing differences and advantages. I did not quote everything you wrote on this regard because fundamentally I accept what you are saying.

    One concern, one of the reasons that has kept me on the fence, is this "special treatment" that FBOs get with the DataFlow approach. From my perspective FBOs are just one type of data being processed. Cameras, meshes, entity lists, render settings are all data that nodes use as input. I suppose the two gBuffer FBOs are a bit special, in the sense that indeed they could be considered as "the car" on the assembly line being passed from one worker (node) to another. At least until we get into the post-processing area, where there is much less of a pattern in this context and many nodes write to their own buffers, to be picked up as input for another node.

    I also feel that in your description the meaning of the edges in the graph in the "scheduler approach" is somewhat limitative: while it's true many of the edges would represent dependencies and imply a processing order, some edges would connect nodes to their camera, while other edges would connect nodes to FBOs and textures nodes.

    I'd say this: I can see you feel strongly about it and your mind has been working on this paradigm for a while now. Just go with it. Go with the DataFlow approach. Ultimately it has to be your proposal. If eventually your proposal gets chosen, I'll support your approach.

    At this stage I'm not understanding this issue you have with different-sized FBOs or why would you want fully agnostic nodes. Can you elaborate?

    Also here I'm not sure I understand your concern. Why should the compiler/tasklist generator be much more complicated than it is right now? Of course it will need to handle NodeGroups, the nodes they encapsulate and the edges of the graph. But you seem to think complexity will increase considerably. Why?

    Well I look forward to your PR in this context. Some of what you mention, i.e. getting the FBO size wrong doesn't seem a programmatically insurmountable obstacle. And many FBOs are the way they are for good reasons. But there certainly is a feeling that some things might be improvable. I.e. I have been wondering if in the post-processing portion of the renderer we should have a pair of swappable FBOs like the gBuffers but with less attachments, i.e. just a color attachment. We'd then pass those FBOs along the post-processing nodes like we would do with the gBuffers. Something to think about.

    Indeed feels like a stretch. I hope you'll be able to be a FlexibleConfig customer with the class being fully capable rather than you having to work on it personally.

    Again, thank you for your interest in this topic.
  11. manu3d

    manu3d Pixel Forge Artisan

    Actually, I'm not sure about this one anymore.
  12. Hybrid

    Hybrid New Member

    Once again, my bad for taking so long to respond. I've just been banging my head on the floor for a few hours over how bad I am at conveying important information.
    You can tell this is pretty much the first time I've ever had to communicate an idea of mine to someone else(that sounds really sad, now that I say it).


    Before we get into the other stuff, I'd just like to get the graph stuff clear. The data-flow graph thingy only really matters at the highest level, and the complexity of it centers in the task list generation stage. It shouldn't actually require much modification of the existing code at all. Here's why:

    Currently the system works like this:
    When the render system starts up, it:
    - Creates an object of each Node, runs it's initalise() function, and adds it to the graph.
    - Toposorts the Nodes into a list.
    - Converts the List<Node> into a List<RenderPipelineTask>

    I mentioned the high-level/low-level split, and higher abstraction levels, but I don't think I ever really explained what I was thinking(probably because I only vaguely understood it either).

    The low-level part is everything we have now. Data-flow is really irrelevant at this level.
    There's not actually too much to change here:

    Note: An edge can represent any Object flowing through the graph. There's no reason to give FBO's first-class treatment.

    Nodes:
    - Instead of the initalise() function, each Node should have a setInputs(), setOutputs(), and a setParameters() method, each of which receive a Map<"label",Object> or something like that.
    - Each Node needs to be able to be created in a default uninitialized state.
    - Each Node needs to declare a set of sockets that it inputs/outputs from. It's init functions would then be called with things matching those sockets.
    - We may also want each Node to provide a function that verifies that it received the correct input/output parameters.

    Tasks/StateChanges:
    - Basically nothing. I'm not entirely sure, but I don't think we need to change a thing here.

    Graph:
    • This is the big one. As it is now, the Graph stores objects of fully configured Nodes, with no explicit connection between them.
    • This needs to change in a few ways:
      • The Nodes need to uninitialized. This is because we will automatically configure them later.
      • Dependencies are now explicit. I'm not entirely sure how to do it, but an edgelist is a valid approach, or an adjacency matrix if we're that worried about performance. This would connect a labeled socket on one Node to a labeled socket on another Node.
    • Now, when we want to convert this Graph to Tasks, we would first configure the Nodes in accordance with the edges, to ensure that the output of a socket goes to where it's supposed to go. This is the added complexity I was talking about. We would need to deduce a lot of implicit things and manage the low-level details automatically.
    Note that even though it's called data-flow, there's no actual data flowing around, and rather Nodes writing to an object, and passing references around.

    The high-level aspect of this would be the addition of a bunch of API functions to the Graph, that allow access in a way that feels like data-flow. The API is kinda like the Apple Shake interface, and basically has 2 functions:
    • addNode(Node n, String label/URI): Add a Node to the graph.
    • connect(Node n1,String socket1, Node n2, String socket2): Draw an edge connecting 2 sockets.
    Note: NodeGroups also implement the Node interface and Graph inteface, so they have the above methods as well(which could get confusing fast).

    We could also have a bunch of convenience functions, to do both at once, but that's not really important right now.

    I'm not sure if Nodes should be required to have at least one FBO input and FBO output. It may make sense, or it may be too restrictive.

    We'd also want to have some special provisions for fall-through Nodes, i.e. Nodes that just do some processing and pass the same Node along.

    Random note: We should decide whether we spell it "initalise" or "initialize". It's spelled both ways in the same file. Not a big deal, but it's triggering my OCD.
  13. manu3d

    manu3d Pixel Forge Artisan

    No worries. This is what the GSOC is for: if you didn't stretch your comfort zone, if you didn't learn anything from it, that's when it would be a failure.


    So, would a camera flow through the graph by the virtue of nodes connected to each other? And what happens when the camera changes because previous nodes rendered a 3d scene from a perspective camera while following nodes are 2d post-processors rendering a quad through opengl's default camera?

    Side note: this introduces the concept of "readiness" in a node, in the sense that right now a node is ready on construction or after the initialize() call. With the interface you are describing a node is only ready when it has everything it needs, probably after at least three calls to the methods you suggest.

    It would be nice if you took an existing node, maybe one of the simpler ones, and wrote a piece of code that shows how those methods would be used in practice, so that we can see how the initialization phase would look like.

    This is what happens with a number of nodes whose initialization is delayed: seems natural to me if you don't want to rely on constructors. Why do you mention this?

    I didn't understand this. What's the difference between this and the phase in which the setInputs/Outputs/Parameters methods are called?

    With edgelist do you mean a list of edges stored in the node itself or in a separate structure?

    This part I definitely do not understand. Why can't the nodes be initialized and ready to go as soon as they are added to the the graph? I mean, of course if nodes A and B are in a chain when I instantiate node A I can't plug it into node B because it isn't there yet. But as soon as I instantiate B I'd have the opportunity to establish all the connections and be ready to go.

    Again, I come back to the need to see a practical case of DAG initialization rooted on existing nodes.

    I've always been thinking more node-centrically, to give as much power as possible to developers who write the nodes. In this context I wonder what the advantages/disadvantages would be in having something like:

    graph.addNode(nodeB, "engine:aLabel")
    nodeB.socket("engine:inputFbo").connect(nodeA.socket("engine:eek:utputFbo"))

    One advantage is that it delegates to the nodes being connected the job of validating the connection and especially the objects passed through it. Using a graph.connect() method you'd have to build a lot of responsibility into the graph instead and you'd have to build a mechanism for nodes to register other validators.

    I'd say: graph.addNode() adds a node to the graph obviously, while nodeGroup.addNode() adds a node to a nodegroup. Then, the Socket.connect() method only works between nodes that are in the graph or are in the same nodegroup. I'm not sure how this constraint would work in the node-centered scenario though: a node developer might choose to avoid the constraint and simply connect to a node that isn't in the same scope. Perhaps the connect() method can be made final and its implementation forces a lookup to make sure both sockets are in the same graph/nodegroup.

    Some nodes might be purely computational and might not need FBOs at all. At least one node right now is purely computational but requires in input a 1x1 pixel PBO - does not output to an FBO. I can certainly conceive purely procedural nodes that write something on an output FBO but require no input. So, I'd say anything can happen in this context. But if inputs/outputs are defined via maps provided to setInput/Output, the minimum requirements of a node should be enforced by a node, i.e. by refusing to function or logging errors if it doesn't have everything it requires.

    You mean pass the same FBO along? Sure, if necessary. I'd be curious to see a case for this.

    LOL. Initialise is British, which is as close as you can get to "the" English. I'd recommend to stick to that and do not give-in to the UnitedStatians who like to change everything they touch to make it theirs...
  14. Hybrid

    Hybrid New Member

    I think I've answered most of this in my rewritten proposal, but I'll go over the important points here.

    I can think of a few ways to do this.
    1. Set the current camera as a global parameter for the graph. This is less elegant that I would prefer, but it would work, and is probably the simplest way.
    2. Use a source Node to bring in new cameras whenever necessary, and pass cameras through the graph using "fall-through" connections (they tell the graph to use the same object for a certain input and output, details in the proposal).

    I don't really understand how openGL cameras work, so I can't really say anything more concrete than that.

    Working on it.

    That's for validating that the correct number of inputs and outputs and parameters were received. Probably too much detail at this point though. I think its becoming a habit for me to do that. I don't think that's a good thing.

    That's stored by the RenderGraph. Edges are part of a graph after all, and storing them in the Node could get messy fast.

    The Graph is responsible for figuring out how to connect Nodes together, and how to send data from one Node to another. It does this by creating shared Objects for the Nodes, and then configuring the two Nodes to use the same Object. The problem here is that this process happens just-in-time(read: lazy), so the Objects aren't available to give to the Nodes at the time when they are added to the RenderGraph. So they have to be set up as blank slates at setup phase. You can still tell the renderGraph that you want a connection involving it, but it doesn't actually do anything till it has to.

    The major advantage here is ease-of-use. All the Node has to do is declare its connections, sockets, and parameters(and validators for each, most of which are common and can be reused), receive the objects to operate on, and do its thing. Not worrying about the fine details of the plumbing is always a wonderful thing(think zeroMQ or ROS (or actual plumbing)).

    I'm not 100% sure how to handle this, but I think we can do it with a combination of default values and Nodes announcing their Sockets beforehand, so the graph can validate it automatically.

    I don't think we do something like this now, but here's one simple example: Suppose we wanted to add a (Gooey) watermark to the screen.
    We would have a gooeyNode right at the end, which just receives the FBO and draws the logo in one corner.

    ...I kinda want to implement that now.

Share This Page