Parsing a grammar

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
Hey folks,
I'm still unsure how to parse the grammars described in Villages Concept and stuff.
At the moment I'm just using a Scanner on the whole file and work witch a lot of if-then-else constructs.
As the files have an own syntax, I thought about using a state machine (hand coded or using a library). Are there any other/better ways doing this.
Or may there be a better structure for grammar files that is easy to parse and still readable/understandable?

All in all a grammar could look like this (in concept, not carved in stone yet)
Code:
# Header with meta information
biomes :        PLAIN,DESERT,...;
styles :        MEDIEVAL;
functionality:  FORGE;
 
# Now the rules are following
building  : (height >= 12)
   ::- floor(height/3) floor(height/3) floor(height/3);
floor                   
   ::- facades rooms;
facades                   
   ::- Comp("sidefaces"){facade};
facade                     
   ::- ... ;
Any advice is appreciated ;)
 

Ten'son'

Member
Contributor
World
Too much words, computer doesn't alla that.

Example of biome and style lines of your file:
B 12 7
S M

Example for the facades:
Facade1 StyleFacade24
Could be:
F1 24

Or that also could be directly written in a binary file.
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
The names like building or floor are the labels of (non terminal) symbols in the grammar. And the grammar has to be human readable (should be something like the json format for block definitions - understandable and "easy" to reconstruct).

Furthermore, these are just the grammar files stored on hard disk, I want to parse them into a format I can better handle (at the moment the production rules are stored in a map for example). Variables and attributes like height or sidefaces will be predefined for all grammars.
 

Ten'son'

Member
Contributor
World
Why it has to be human readable?
If it's to check errors you could do a little program that translate the file into a human readable file. But more there are letters and words, slowest will be the computer to open/read it.

EDIT: And we all know that the hard drive is the slowest component in a computer, so less we have to use it faster the program will be.
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
Its not to check errors or if the grammars syntax is correct but to generate the buildings from. So, with the grammar you define how a window could look like, how the buildings corners should look, which roof types it can use, etc. and generate a random building following the rules (with different numbers of floors, different footprints and so on). Thus, the system should be easy expandable by mods adding new grammar files.
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
Grammars are intended to be written by humans, so having them human readable is a good idea. We would load them all up in memory during the start of the game anyway, so the file size isn't a big issue. Or we could provide something to convert them to binary format if it turns out to be an issue (don't prematurely optimise :) ).

You may wish to look into the ANTLR library. Or if you use a JSON format getting them into memory is simple with GSON - in which case you can just create a Java Bean that reflects the style with enums for many of the keywords. I'm not sure whether JSON would be sufficient for your purposes though.
 

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Not that I'm really thinking/imagining it is a valid option but I know we use Drools at work for writing business rules surrounding insurance policies. It is a rules engine with fairly plain english-like syntax (in Java, of course, or even Groovy). I'm not sure if that or some other existing rules engine could be used to something like building definitions? I honestly don't know what the difference would really be between a grammar system or a rules engine.
 

mkalb

Active Member
Contributor
Logistics
Groovy is very good for writing own (human readable) languages/grammers.
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
I've worked with ANTLR before, and it is quite usable, especially for officially supported languages such as Java :)
I've had a look at that and I was quite an overkill to me (at least it seemed so ;)) Do I really need all this stuff like Parser, Lexer, ... ?
 

x3ro

Member
Contributor
GUI
I've had a look at that and I was quite an overkill to me (at least it seemed so ;)) Do I really need all this stuff like Parser, Lexer, ... ?
As a matter of fact you do. You could leave out the lexer and directly parse the language, without converting it to a token stream first, but that would make the parser much more complicated, afaik. Also, ANTLR generates all that stuff for you, which you can then invoke directly from Java.
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
Okay, then I'll try my best (making a new cup of coffee) :D. Can I come back to you if I get into some serious trouble?
 

x3ro

Member
Contributor
GUI
Sure thing. I'm by no means an ANTLR expert, but I'll do my best to help you :D
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
Ok, I've put some things together right now and before I continue doing crap I would like to ask you to have a look on what I've done so far...

First of all, there is the antlr3 grammar file (for defining the grammar itself) as it is so far (PAGDefinition.g).

It scans/parses the example grammar from below quite fine (visualized with ANTLRWorks). Am I on the right way? Are there things I should change?
 

Attachments

Cervator

Org Co-Founder & Project Lead
Contributor
Design
Logistics
SpecOps
Neat! That looks very sophisticated and capable. Beyond my technical skill to comment on much, but it looks reasonable for a contributor to write rules, if not the language. What's a "PAG" ? :)

Is this technically a DSL then, or something else?
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
I tried to find a catchy and fitting acronym and came up with PAG - Procedural Architecture Grammar.

Yes, I think one can call it a DSL :) But it's the first time I do something like that, so I try to get as much feedback during the development as possible ;)
 

mkalb

Active Member
Contributor
Logistics
Groovy is very good for writing a DSL. And you can use the groovy editor within IntelliJ IDEA.

Gradle has a ANTLR plugin.
 

Immortius

Lead Software Architect
Contributor
Architecture
GUI
I'm not sure I follow the example. I guess "vert" is the vertical axis (not sure what you plan to use for the two horizontal axes). It isn't clear whether the ground floor or roof should end up on the top or bottom though? Also not sure what the 'r' means in '1r'.

But presumably what we end up with is three layers of floor and then a layer of roof? Not sure why roof gets away with Roof() instead of I() (unless that is another method).

Anyway, I would suggest leaning towards the verbose to start - stick to full words, be descriptive in what things do. But feel free to use terms more suitable for the domain, rather than those typical to programming. Instead of instantiate, you might use "Plot", "Build", "Construct" or other building related terms for instance.

I can't really comment on the grammer itself, I haven't used ANTLR.
 

Skaldarnar

Development Lead
Contributor
Art
World
SpecOps
Thanks Immortius, I think I am going to change some of the keywords (Would a set command be better understandable?). The example grammar will grow with the time and I will add more detailed information about the available commands and options. For the first attempt I stayed real close to the paper by Müller and Wonka and their language definitions.

For further explanation: The values in the Subdiv-command specify absolute or relative values, that means for the example rule that the ground floor should be 3 blocks high and the roof height is scaled to the remaining height (1r indicates that its relative, might change to only r instead.).

The vert stands for the vertical axis, and I think I am going to divide things then from the bottom to the top (as in the example). Other possible values for the direction/axis are listed in the PAGDefiniton under direction (which actually are "X", "Y", "Z", "XZ" ...). I just thought vert is kind of more descriptive.
 
Top