Mammut Online Documentation

Mammut is a LALR(1) parser generator based on the source code of Bison++. Mammut is released under the GNU Public Licence (GPL). See http://www.gnu.org/copyleft/gpl.html for details.

Bison++ is in turn based on the Bison source code and introduces constructs to be able to create C++ parser classes. Bison++ tries to keep the same interface and behaviour as the original Bison, except for the OO-part.

The Mammut project was started as a reaction to the difficulties of maintaining and debugging Bison++ generated code. Bison++ is heavily dependent of #defines and it becomes hard to follow the defined values when e.g. debugging. Mammut on the other hand is based on C++ templates and offers a cleaner separation of the parser, grammar and semantic actions. This makes it possible to reuse the parser and grammar on different semantic actions. In addition, all generated code and parser generator code is heavily cleaned up and restructured. This also enhances the possibility of future enhancements and extensions.

Related Work

Bottom-up Parsers

Mammut was built from the codebase of the Bison++ project (generated C++ code), which in turn is built upon the GNU/Bison project (generates C code) (http://www.gnu.org/software/bison/bison.html).

BtYacc http://gnuwin32.sourceforge.net/packages/btyacc.htm is a backtracking LALR(1) parser where ambiguities can be solved through backtracking. C/C++ is supported.

Top-down Parsers

The D-parser (http://dparser.sourceforge.net/) is a flexible scannerless GLR parser based on the Tomita algorithm.

The boost::spirit (http://www.boost.org/libs/spirit/index.html) parser is a recursive-descent parser generator which supports EBNF grammars. Spirit is a DSEL (Domain Specific Embedded Lanugate) implemented in C++. Grammars are written in template based standard C++ code.

Example: Obj Loader

Lets start off by showing a simple example of using Mammut.

The OBJ file format is commonly used in 3D graphics applications to store mesh data. The grammar is shown in the ObjGrammar.y file. The grammar looks mostly like an ordinary Bison++ grammar file. The big difference is that we are using a delegating object pointer called "act" for semantic actions. This is not a forced behaviour and can be changed by modifying the Mammut template files.

The parse-function is a free standing function that accepts a lexical analyser and a grammar as arguments

ObjGrammar<ObjActions> grammar; parse(lexer, grammar);is an instantiation of the generated ObjGrammar with the locally defined ObjActions. This grammar is then used by the parser along with a lexical analyzer. The lexical analyser in this example uses a flex++ class wrapped in a Flex<> class. A lexical analyser must follow a concept of having a NextToken(...) member function, which returns the next token.

Now, we have created a parser that builds mesh objects from the obj-files in our application. Imagine that we would like to have a function that just reads an obj-file and writes statistics about it. Instead of writing a new grammar-file, we create a new ObjInfoAction class instead. This action file gathers information about the obj-file. The application of the new parser would look like this ObjGrammar<ObjActions> grammar; ObjGrammar<ObjInfoAction> infoGrammar; parse(lexer, grammar); parse(lexer, infoGrammar);

Sometimes it is necessary to debug the parsing process. This is easily accomplished: parse<ConsoleDebug>(lexer, grammar); ConsoleDebug is a simple class that writes state changes and other informational messages to std::cout.

Default parsing uses the NoDebug class which removes all debugging code compile-time. It is easy to extend NoDebug, ConsoleDebug or to write completely new debugging classes as needed.

Mammut

Mammut was built on the codebase of Bison++, but there have been substantial changes in code and functionality. The obvious changes for the end-user is that the files produced from Mammut is cleaner and free from #defines. Instead templates are heavily used. This doesn't only give us a better chance to see exactly what code has been generated, but also gives us type independent code. This can be utilized for e.g. easily creating different parsers with the same grammar (as in the introcuctory example).

Mammut has been split into three separate parts - parser, grammar and semantic actions:

Parser: The parse-function never changes, regardless of what grammar file (.y) is used. The parser has therefore been separated out to the file mammut/parser.h.
Grammar: This is what differentiate parsers for different languages. The grammar is used to create data structures with values that the parser-function in turn uses to accomplish its task. The grammar is what is generated from the Mammut tool.
Semantic Actions: In Bison++, semantic actions easily became intertwined in the grammar files (.y). In Mammut they are more separated from it and in large parts defined in their own implementation files written by the user. The core of the semantic actions are however still existent in the grammar file.

Template Files

Another big difference from Bison++ is the increased support for and flexibility of the template files. Bison++ uses skeleton files for the header- and implementation files. Mammut has added flexibility to extracting and composing the grammar files. Instead of having the parser generator emit code in a hard-coded fashion. Mammut uses template files that extract desired data. Effort has been put into making the extractable data as language independent as possible, and putting language dependence into the template files instead. It should be possible to use the Mammut tool to create parser generators for other languages such as Java and C# by creating specific template files and translating the parser function. Support for other languages might be included by default in later releases.

The template file has one reserved character, '$'. The dollar sign indicates the a beginning of a mammut keyword. The dollar sign is escaped by '$$'. Most of the keywords are present in the mammut.cpp.template file. [A future extension will expose all keywords through a command line parameter.]

Other Differences

Bison and the likes have gone to great lengths to maintain command-line compatibility with ancient incarnations of parser generators. In Mammut we have tried to simplify the options to accommodate to the changed behaviour of the parser generator.

Another big difference from Bison++ is that the Mammut code-base includes C++ code.

Important! Mammut uses a configuration directory that should be placed in the same directory as the mammut executable. See how the dist directory is organized. In the configuration directory both template files and default configuration file is stored.

Command line Options

Usage: mammut [-Vvul] [-o outfile] [-t template file] [--no-lines] [--verbose] [--version] [--usage] [--help] [--template=template file] [--output-file=outfile] grammar-file

-V: Prints out version information.
-v, --verbose: Verbose output. Prints detailed information during parser generation.
--help: Prints out the command line options
-l, --nolines: Omits the line information in the generated source files. This is useful for detailed debugging
-n, --noconfig: Do not use the default config file.
-c filename, --config filename: Specifies the configuration file name. This file is used to generate multiple output files from a single command. The default configuration file is used if nothing is specified. The default configuration will generate one *.cpp and one *.h-file. See also the --prefix option.
-p argument, --prefix argument: Set the output file prefix. This option is reachable from the configuration file through the ${file_prefix} directive.
-k arguments, --keywords arguments: Gives the opportunity to add keyword/value pairs reachable from the template files. E.g. "-k headerfile= Output.h" would allow the directive "#include <$headerfile>" in a template file to generate "#include <Output.h>". Multiple arguments can be supplied through a comma separated list (e.g.-kheaderfile=Output.h,myvariable=MyName,myversion=MyVersion). Warning: when the '-k' argument is used in a configuration file the key/value pairs are active throughout the whole configuration file from the point where it is defined.
-o filename, --output filename: Specifies the output file name. This is the file that is generated from the template- and the grammar files. If there is a configuration file being executed the output-file is accessed as "${output_file}" in the config-file.
-t filename, --template filename: Specifies the template file to be used. Normally the mammut.cpp.template and mammut.h.template is used to create the resulting .cpp and .h-files. Accessed by "${output_template} in a config-file.
--input: Specified the grammar file used as input. The --input option can be omitted if the grammar file is specified as the very last argument.

The Future

Mammut is the result of immediate needs within Lumai to create more modern C++ parser classes. Future versions might include support for other programming languages such as Java, C, and C# as well as other minor improvements to the parser generator.

Building the source

Prerequisites

The source needs the BOOST package 1.32 or newer (http://www.boost.org) to be installed and configured. Mammut uses the filesystem and program_options libraries from BOOST.

Windows + Microsoft Visual Studio .net 2003

Use the solution provided in the Build/VS .Net 2003 folder. Earlier versions of Visual Studio are not supported due to template problems. Executables will be created in the Dist folder.

Linux + GCC

Execute the script file make.sh in the Build/LinuxGcc folder. (cd Build/LinuxGcc ; ./make.sh) This will build the executable mammut in the same folder. Feel free to contribute a proper make script. This was tested with GCC 3.3.5 although newer versions of GCC should not be a problem.

Document version history

2005-06-30 Martin Skogevall: First version

2005-07-02 Per Lindén: Minor fixes and "Building the source" section added

2005-07-20 Martin Skogevall: Program options section was completed.