Mammut is a LALR(1) parser generator based on the source code of bison++. Mammut is released under the GNU Public Licence (GPL). See http://www.gnu.org/copyleft/gpl.html for details.
bison++ is in turn based on the bison source code and introduces constructs to be able to create C++ parser classes. bison++ tries to keep the same interface and behaviour as the original bison, except for the OO-part.
The Mammut project was started as a reaction to the difficulties of maintaining
and debugging bison++ generated code. bison++ is heavily dependent of #define
s
and it becomes hard to follow the defined values when e.g. debugging. Mammut on
the other hand is based on C++ templates and offers a cleaner separation of the
parser, grammar and semantic actions. This makes it possible to reuse the
parser and grammar on different semantic actions. In addition, all generated
code and parser generator code is heavily cleaned up and restructured. This
also enhances the possibility of future enhancements and extensions.
Mammut was built from the codebase of the Bison++ project (generated C++ code), which in turn is built upon the GNU/Bison project (generates C code) (http://www.gnu.org/software/bison/bison.html).
BtYacc http://gnuwin32.sourceforge.net/packages/btyacc.htm is a backtracking LALR(1) parser where ambiguities can be solved through backtracking. C/C++ is supported.
The D-parser (http://dparser.sourceforge.net/) is a flexible scannerless GLR parser based on the Tomita algorithm.
The boost::spirit (http://www.boost.org/libs/spirit/index.html) parser is a recursive-descent parser generator which supports EBNF grammars. Spirit is a DSEL (Domain Specific Embedded Lanugate) implemented in C++. Grammars are written in template based standard C++ code.
Lets start off by showing a simple example of using Mammut.
The OBJ file format is commonly used in 3D graphics applications to store mesh data. The grammar is shown in the ObjGrammar.y file. The grammar looks mostly like an ordinary bison++ grammar file. The big difference is that we are using a delegating object pointer called "act" for semantic actions. This is not a forced behaviour and can be changed by modifying the Mammut template files.
The parse
-function is a free standing function that accepts a
lexical analyser and a grammar as arguments
ObjGrammar<ObjActions> grammar;
is an instantiation of the generated
parse(lexer, grammar); ObjGrammar
with the locally defined ObjActions
. This grammar is then used by
the parser along with a lexical analyzer. The lexical analyser in this example
uses a flex++ class wrapped in a Flex<>
class. A lexical
analyser must follow a concept of having a NextToken(...)
member
function, which returns the next token.
Now, we have created a parser that builds mesh objects from the obj-files in our
application. Imagine that we would like to have a function that just reads an
obj-file and writes statistics about it. Instead of writing a new grammar-file,
we create a new ObjInfoAction
class instead. This action file
gathers information about the obj-file. The application of the new parser would
look like this ObjGrammar<ObjActions> grammar;
ObjGrammar<ObjInfoAction> infoGrammar;
parse(lexer, grammar);
parse(lexer, infoGrammar);
Sometimes it is necessary to debug the parsing process. This is easily
accomplished: parse<ConsoleDebug>(lexer, grammar);
ConsoleDebug
is a simple class that writes state changes and other
informational messages to std::cout
.
Default parsing uses the NoDebug
class which removes all debugging
code compile-time. It is easy to extend NoDebug
, ConsoleDebug
or to write completely new debugging classes as needed.
Mammut was built on the codebase of bison++, but there have been substantial
changes in code and functionality. The obvious changes for the end-user is that
the files produced from Mammut is cleaner and free from #define
s.
Instead templates are heavily used. This doesn't only give us a better chance
to see exactly what code has been generated, but also gives us type independent
code. This can be utilized for e.g. easily creating different parsers with the
same grammar (as in the introcuctory example).
Mammut has been split into three separate parts - parser, grammar and semantic actions:
parse
-function never changes, regardless of what grammar file
(.y) is used. The parser has therefore been separated out to the file mammut/parser.h
.
parser
-function in
turn uses to accomplish its task. The grammar is what is generated from the
Mammut tool. Another big difference from bison++ is the increased support for and flexibility of the template files. Bison++ uses skeleton files for the header- and implementation files. Mammut has added flexibility to extracting and composing the grammar files. Instead of having the parser generator emit code in a hard-coded fashion. Mammut uses template files that extract desired data. Effort has been put into making the extractable data as language independent as possible, and putting language dependence into the template files instead. It should be possible to use the Mammut tool to create parser generators for other languages such as Java and C# by creating specific template files and translating the parser function. Support for other languages might be included by default in later releases.
The template file has one reserved character, '$'. The dollar sign indicates the a beginning of a mammut keyword. The dollar sign is escaped by '$$'. Most of the keywords are present in the mammut.cpp.template file. [A future extension will expose all keywords through a command line parameter.]
Bison and the likes have gone to great lengths to maintain command-line compatibility with ancient incarnations of parser generators. In Mammut we have tried to simplify the options to accommodate to the changed behaviour of the parser generator.
Another big difference from bison++ is that the Mammut code-base includes C++ code.
Important! Mammut uses a configuration directory that should be placed in the same directory as the mammut executable. See how the dist directory is organized. In the configuration directory both template files and default configuration file is stored.
Usage: mammut [-Vvul] [-o outfile] [-t template file] [--no-lines] [--verbose]
[--version] [--usage] [--help] [--template=template file]
[--output-file=outfile] grammar-file
Mammut is the result of immediate needs within Lumai to create more modern C++ parser classes. Future versions might include support for other programming languages such as Java, C, and C# as well as other minor improvements to the parser generator.
The source needs the BOOST package 1.32 or newer (http://www.boost.org) to be installed and configured. Mammut uses the filesystem and program_options libraries from BOOST.
Use the solution provided in the Build/VS .Net 2003
folder. Earlier
versions of Visual Studio are not supported due to template problems.
Executables will be created in the Dist
folder.
Execute the script file make.sh
in the Build/LinuxGcc
folder.
(cd Build/LinuxGcc ; ./make.sh
) This will build the executable mammut
in the same folder. Feel free to contribute a proper make
script.
This was tested with GCC 3.3.5 although newer versions of GCC should not be a
problem.
2005-06-30 Martin Skogevall: First version
2005-07-02 Per Lindén: Minor fixes and "Building the source" section added
2005-07-20 Martin Skogevall: Program options section was completed.