
Mammut Online Documentation
Mammut is a LALR(1) parser generator based on the source code of Bison++. Mammut is released under the GNU Public Licence (GPL). See http://www.gnu.org/copyleft/gpl.html for details.
Bison++ is in turn based on the Bison source code and introduces constructs to be able to create C++ parser classes. Bison++ tries to keep the same interface and behaviour as the original Bison, except for the OO-part.
The Mammut project was started as a reaction to the difficulties of maintaining
and debugging Bison++ generated code. Bison++ is heavily dependent of #define
s
and it becomes hard to follow the defined values when e.g. debugging. Mammut on
the other hand is based on C++ templates and offers a cleaner separation of the
parser, grammar and semantic actions. This makes it possible to reuse the
parser and grammar on different semantic actions. In addition, all generated
code and parser generator code is heavily cleaned up and restructured. This
also enhances the possibility of future enhancements and extensions.
Related Work
Bottom-up Parsers
Mammut was built from the codebase of the Bison++ project (generated C++ code), which in turn is built upon the GNU/Bison project (generates C code) (http://www.gnu.org/software/bison/bison.html).
BtYacc http://gnuwin32.sourceforge.net/packages/btyacc.htm is a backtracking LALR(1) parser where ambiguities can be solved through backtracking. C/C++ is supported.
Top-down Parsers
The D-parser (http://dparser.sourceforge.net/) is a flexible scannerless GLR parser based on the Tomita algorithm.
The boost::spirit (http://www.boost.org/libs/spirit/index.html) parser is a recursive-descent parser generator which supports EBNF grammars. Spirit is a DSEL (Domain Specific Embedded Lanugate) implemented in C++. Grammars are written in template based standard C++ code.
Example: Obj Loader
Lets start off by showing a simple example of using Mammut.
The OBJ file format is commonly used in 3D graphics applications to store mesh data. The grammar is shown in the ObjGrammar.y file. The grammar looks mostly like an ordinary Bison++ grammar file. The big difference is that we are using a delegating object pointer called "act" for semantic actions. This is not a forced behaviour and can be changed by modifying the Mammut template files.
The parse
-function is a free standing function that accepts a
lexical analyser and a grammar as arguments
ObjGrammar<ObjActions> grammar;
is an instantiation of the generated
parse(lexer, grammar); ObjGrammar
with the locally defined ObjActions
. This grammar is then used by
the parser along with a lexical analyzer. The lexical analyser in this example
uses a flex++ class wrapped in a Flex<>
class. A lexical
analyser must follow a concept of having a NextToken(...)
member
function, which returns the next token.
Now, we have created a parser that builds mesh objects from the obj-files in our
application. Imagine that we would like to have a function that just reads an
obj-file and writes statistics about it. Instead of writing a new grammar-file,
we create a new ObjInfoAction
class instead. This action file
gathers information about the obj-file. The application of the new parser would
look like this ObjGrammar<ObjActions> grammar;
ObjGrammar<ObjInfoAction> infoGrammar;
parse(lexer, grammar);
parse(lexer, infoGrammar);
Sometimes it is necessary to debug the parsing process. This is easily
accomplished: parse<ConsoleDebug>(lexer, grammar);
ConsoleDebug
is a simple class that writes state changes and other
informational messages to std::cout
.
Default parsing uses the NoDebug
class which removes all debugging
code compile-time. It is easy to extend NoDebug
, ConsoleDebug
or to write completely new debugging classes as needed.
Mammut
Mammut was built on the codebase of Bison++, but there have been substantial
changes in code and functionality. The obvious changes for the end-user is that
the files produced from Mammut is cleaner and free from #define
s.
Instead templates are heavily used. This doesn't only give us a better chance
to see exactly what code has been generated, but also gives us type independent
code. This can be utilized for e.g. easily creating different parsers with the
same grammar (as in the introcuctory example).
Mammut has been split into three separate parts - parser, grammar and semantic actions:
- Parser
- The
parse
-function never changes, regardless of what grammar file (.y) is used. The parser has therefore been separated out to the filemammut/parser.h
. - Grammar
-
This is what differentiate parsers for different languages. The grammar is used
to create data structures with values that the
parser
-function in turn uses to accomplish its task. The grammar is what is generated from the Mammut tool. - Semantic Actions
- In Bison++, semantic actions easily became intertwined in the grammar files (.y). In Mammut they are more separated from it and in large parts defined in their own implementation files written by the user. The core of the semantic actions are however still existent in the grammar file.
Template Files
Another big difference from Bison++ is the increased support for and flexibility of the template files. Bison++ uses skeleton files for the header- and implementation files. Mammut has added flexibility to extracting and composing the grammar files. Instead of having the parser generator emit code in a hard-coded fashion. Mammut uses template files that extract desired data. Effort has been put into making the extractable data as language independent as possible, and putting language dependence into the template files instead. It should be possible to use the Mammut tool to create parser generators for other languages such as Java and C# by creating specific template files and translating the parser function. Support for other languages might be included by default in later releases.
The template file has one reserved character, '$'. The dollar sign indicates the a beginning of a mammut keyword. The dollar sign is escaped by '$$'. Most of the keywords are present in the mammut.cpp.template file. [A future extension will expose all keywords through a command line parameter.]
Other Differences
Bison and the likes have gone to great lengths to maintain command-line compatibility with ancient incarnations of parser generators. In Mammut we have tried to simplify the options to accommodate to the changed behaviour of the parser generator.
Another big difference from Bison++ is that the Mammut code-base includes C++ code.
Important! Mammut uses a configuration directory that should be placed in the same directory as the mammut executable. See how the dist directory is organized. In the configuration directory both template files and default configuration file is stored.
Command line Options
Usage: mammut [-Vvul] [-o outfile] [-t template file] [--no-lines]
[--verbose] [--version] [--usage] [--help] [--template=template file]
[--output-file=outfile] grammar-file
- -V
- Prints out version information.
- -v, --verbose
- Verbose output. Prints detailed information during parser generation.
- --help
- Prints out the command line options
- -l, --nolines
- Omits the line information in the generated source files. This is useful for detailed debugging
- -n, --noconfig
- Do not use the default config file.
- -c filename, --config filename
- Specifies the configuration file name. This file is used to generate multiple output files from a single command. The default configuration file is used if nothing is specified. The default configuration will generate one *.cpp and one *.h-file. See also the --prefix option.
- -p argument, --prefix argument
- Set the output file prefix. This option is reachable from the configuration file through the ${file_prefix} directive.
- -k arguments, --keywords arguments
- Gives the opportunity to add keyword/value pairs reachable from the template files. E.g. "-k headerfile= Output.h" would allow the directive "#include <$headerfile>" in a template file to generate "#include <Output.h>". Multiple arguments can be supplied through a comma separated list (e.g.-kheaderfile=Output.h,myvariable=MyName,myversion=MyVersion). Warning: when the '-k' argument is used in a configuration file the key/value pairs are active throughout the whole configuration file from the point where it is defined.
- -o filename, --output filename
- Specifies the output file name. This is the file that is generated from the template- and the grammar files. If there is a configuration file being executed the output-file is accessed as "${output_file}" in the config-file.
- -t filename, --template filename
- Specifies the template file to be used. Normally the mammut.cpp.template and mammut.h.template is used to create the resulting .cpp and .h-files. Accessed by "${output_template} in a config-file.
- --input
- Specified the grammar file used as input. The --input option can be omitted if the grammar file is specified as the very last argument.
The Future
Mammut is the result of immediate needs within Lumai to create more modern C++ parser classes. Future versions might include support for other programming languages such as Java, C, and C# as well as other minor improvements to the parser generator.
Building the source
Prerequisites
The source needs the BOOST package 1.32 or newer (http://www.boost.org) to be installed and configured. Mammut uses the filesystem and program_options libraries from BOOST.
Windows + Microsoft Visual Studio .net 2003
Use the solution provided in the Build/VS .Net 2003
folder.
Earlier versions of Visual Studio are not supported due to template problems.
Executables will be created in the Dist
folder.
Linux + GCC
Execute the script file make.sh
in the Build/LinuxGcc
folder. (cd Build/LinuxGcc ; ./make.sh
) This will build the
executable mammut
in the same folder. Feel free to contribute a
proper make
script. This was tested with GCC 3.3.5 although newer
versions of GCC should not be a problem.
Document version history
2005-06-30 Martin Skogevall: First version
2005-07-02 Per Lindén: Minor fixes and "Building the source" section added
2005-07-20 Martin Skogevall: Program options section was completed.
