Thursday, October 04, 2007

Antlr for the SQL module

To build the internal SQL module for Matrex I decided to work in the following way:
  • Write a parser that converts the SQL expression in an internal object structure
  • Write the code that applies the parsed SQL to the matrices/vectors arguments of the SQL function.
To do the parsing work, I chose the Antlr library.
Antlr has the following advantages compared to the other parsers:
  • More people use it (at least it looks like)
  • The parse result can be code in different languages (Java, C#, python...) that can be useful if I want to port the same grammar to other projects
  • Together with the library you can download a graphical application called AntlrWorks to interactively test and debug your grammar. AntlrWorks is a very good tool, that let you find errors in your grammar before you start to use it.
Antlr is a wonderful product, but I suffered creating the SQL grammar I needed.
The reasons are probably:
  • my inexperience in terms of parsers/lexers
  • some confusion and some holes in the free documentation
My initial idea was to download Antlr, get a grammar describing the SQL SELECT statement, adapt the grammar to my needs, build the java sources from it and convert the produced AST trees to my internal structures.

Simple, right? Wrong. Here are the problems for this approach:
  1. Antlr is in version 3 now (is normally called v3). The example grammars are most made for version 2 (2.7.x). Altough v2 and v3 grammars look very similar, to convert a v2 grammar to a v3 one is not easy.
  2. It is possible to buy a book written by the Antlr's author. I did not want to buy the book because I'm not planning to use Antlr in the future. But then I discovered that the online documentation is partial and often referring to the old 2.7.2 version.
  3. The produced java classes can have a method to get the AST tree, but as far as I understood the tree cannot be used for an interpreter, but only to check the result of the statement parsing.
  4. To build an interpreter is not the only purpose for using a parser:
    Antlr is used for many other things, for example to compile, which
    means convert expressions from a grammar to another one. Consider this to avoid to get confused reading the documentation.

I struggled for a pair of weeks with these problems. At the end I was able to to understand the following concepts and to produce my grammar:

  • You can add java code directly in the grammar. With this code you can build the interpreter structures directly in the generated java code.
  • Be very careful about the case of the initial letter of the rule names. Upper case: lexer rule; Lower case : parser rule. It looks simple, but if only the initial letter of one rule name is wrong nothing works as it should.
  • The lexer is used to parse single words (identifier, strings, numbers). The parser is used to parse phrases.
  • Spaces are handled automatically by the parser.
  • In the java code that you add to the grammar you can set the package of the generated classes.
  • AntlrWorks generates two types of java classes: the ones to use in your application and the ones that it uses to debug. They are saved in the same place with the same names. The debug classes don't work in your application, so remember to generate the application classes after a debug session.
So, if you give a look to the grammar I have written, you'll see that it is confused (rules definitions together with java code), but it works. When I run the generated java classes against an SQL expression the structures declared in the @members block are populated with the correct values and from them I can interpret the expression.




4 comments:

imoldcat said...

Thanks for sharing. I've encountered the same problem with you. Start to use antlr just for a few days ago and the reason to choose it is the so called well supported documents. at least i know, not for v3.

combray said...

The book is great. I'm not sure why you wouldn't count that is superior documentation for v3 instead of v2 -- which it clearly is. It would be pretty silly to use ANTLR and not get the book.

Ricardo Ferreira said...

Thanks a lot!

Having worked with Lex and Yacc in the past it was a some what difficult to understand how ANTLR works. And you just answered my doubts.

Anonymous said...

Thats what I was looking..... Thanks a lot