To build the
internal SQL module for Matrex I decided to work in the following way:
- Write a parser that converts the SQL expression in an internal object structure
- Write the code that applies the parsed SQL to the matrices/vectors arguments of the SQL function.
To do the parsing work, I chose the
Antlr library.
Antlr has the following advantages compared to the other parsers:
- More people use it (at least it looks like)
- The parse result can be code in different languages (Java, C#, python...) that can be useful if I want to port the same grammar to other projects
- Together with the library you can download a graphical application called AntlrWorks to interactively test and debug your grammar. AntlrWorks is a very good tool, that let you find errors in your grammar before you start to use it.
Antlr is a wonderful product, but I suffered creating the SQL grammar I needed.
The reasons are probably:
- my inexperience in terms of parsers/lexers
- some confusion and some holes in the free documentation
My initial idea was to download Antlr, get a
grammar describing the SQL SELECT statement, adapt the grammar to my needs, build the java sources from it and convert the produced AST trees to my internal structures.
Simple, right? Wrong. Here are the problems for this approach:
- Antlr is in version 3 now (is normally called v3). The example grammars are most made for version 2 (2.7.x). Altough v2 and v3 grammars look very similar, to convert a v2 grammar to a v3 one is not easy.
- It is possible to buy a book written by the Antlr's author. I did not want to buy the book because I'm not planning to use Antlr in the future. But then I discovered that the online documentation is partial and often referring to the old 2.7.2 version.
- The produced java classes can have a method to get the AST tree, but as far as I understood the tree cannot be used for an interpreter, but only to check the result of the statement parsing.
- To build an interpreter is not the only purpose for using a parser:
Antlr is used for many other things, for example to compile, which
means convert expressions from a grammar to another one. Consider this to avoid to get confused reading the documentation.
I struggled for a pair of weeks with these problems. At the end I was able to to understand the following concepts and to produce
my grammar:
- You can add java code directly in the grammar. With this code you can build the interpreter structures directly in the generated java code.
- Be very careful about the case of the initial letter of the rule names. Upper case: lexer rule; Lower case : parser rule. It looks simple, but if only the initial letter of one rule name is wrong nothing works as it should.
- The lexer is used to parse single words (identifier, strings, numbers). The parser is used to parse phrases.
- Spaces are handled automatically by the parser.
- In the java code that you add to the grammar you can set the package of the generated classes.
- AntlrWorks generates two types of java classes: the ones to use in your application and the ones that it uses to debug. They are saved in the same place with the same names. The debug classes don't work in your application, so remember to generate the application classes after a debug session.
So, if you give a look to the
grammar I have written, you'll see that it is confused (rules definitions together with java code), but it works. When I run the generated java classes against an SQL expression the structures declared in the
@members block are populated with the correct values and from them I can interpret the expression.