codedread

Grammar


This page discusses the grammar of my Config File Parser and the error that I made in the grammar. Familiarization with BNF is probably necessary to make sense of it.

My original configuration file grammar in EBNF is (note: these rules are defined almost verbatim this way in my code, thanks to boost::spirit):

// Comments
<cpp_comment> ::= "//" (<any character> - <end of line>)* <end of line> <c_comment> ::= "/*" (<any character> - "*/")* "*/" <comment> ::= (cpp_comment | c_comment) // Primitive Values <boolean_value> ::= ("true" | "false") <real_value> ::= ['-'] <digit>+ '.' <digit>* <int_value> ::= ['-'] <digit>+ <double_quote> ::= '"' <escape_codes> ::= '\' ('a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' | <double_quote> | '\') <string_value> ::= <double_quote> ((<printable character> - <double_quote> - '\')* | <escape_codes>) <double_quote> <primitive_value> ::= ( <boolean_value> | <real_value> | <int_value> | <string_value> ) // Property Name <property_name> ::= ((<alpha character> | '_') (<alpha character> | <digit> | '_')*) - <boolean_value>

So far, so good. The problem was introduced when I attempted to define higher rules based off of these simple rules:

// Assignment Rules <assignment> ::= ( <simple_assignment> | <composite_assignment> ) <simple_assignment> ::= <property_name> '=' <primitive_value> ';' <composite_assignment> ::= <property_name> '{' <assignment>* '}' // Final Rule <configuration> ::= ( <assignment> | <comment> )*

At first, the above seemed like the right way to go. On the surface it seemed so to me, anyway. But the problem exist with the last 4 parsing rules. None of the assignment rules include the ability to embed comments. Thus, the following desired configuration file examples would all fail the parser:

XPosition // Example value = 4.2; MaxHealth = // Change this to 10 for release build 10000 ; MaxLives = 500 //3 ; Player { // Player's Unique ID ID = 40; }

As can be verified above, there is no rule that allows comments to be embedded inside simple or composite assignment statements. This can be remedied by changing 2 of the grammar rules like so:

// Assignment Rules <simple_assignment> ::= <property_name> (<comment>)* '=' (<comment>)* <primitive_value> (<comment>)* ';' <composite_assignment> ::= <property_name> (<comment>)* '{' <configuration> '}' <configuration> ::= ( <assignment> | <comment> )*

It ain't pretty, but that's how it is: If we want to make the comments injectable anywhere before and after the equals sign and before the semicolon, we must clutter up our parsing rules with optional comments everywhere.

What's interesting to note is that Spirit allowed me to focus on correcting the proper grammar without worrying about updating the parsing code. I simply corrected my grammar and the parser is corrected upon my next compile.

codedread codedread