Wednesday, October 8, 2008

Lex It! Yacc It!

Lately I have been working a lot with the two legendary parser generator tools, ‘Lex’ and ‘Yacc’. (Well it’s actually ‘Flex’ and ‘Bison’ to be precise) To be honest I didn’t have any experience with Lex and Yacc two weeks back. But the two applications are so easy to learn and master that anyone could become an expert compiler developer in no time. Practically speaking one could start developing parsers and compilers without any prior knowledge on compiler theory or language parsing. All you need to get started is some experience in C programming. However I have to admit that some knowledge on regular expressions, context free grammars and push down automatons can make your life lot easier since that would help you better understand the syntax of the two configuration languages used by Lex and Yacc. 

Lex (or Flex) generates a C function called yylex which is capable of scanning and tokenizing a given input. The input to Lex primarily consists of regular expressions and C code segments to be executed on detection of strings that match the regular expression definitions. Yacc (or Bison) generates a C function called yyparse which accepts a sequence of tokens from yylex and parses them. The YACC input would include the productions of the language grammar and C code segments to be executed on application of the productions. 

It’s truly amazing how easy to develop parsers to accept complex languages with Lex and Yacc. The more I use it more I like it. The syntax of the configuration files are well structured and so easy to understand. I was so inspired by these two applications that I started developing my own command-line mathematics package for Unix/Linux systems. My plan is to develop something which is similar to the command interpreter that comes with MATLAB. I have already implemented support for executing simple mathematical expressions, trigonometric functions, logarithms and variables. I’m currently trying to implement the support for user defined functions. Once that is done I intend to implement support for matrix algebra and simple statistics.

The above sounds like a lot of work but thanks to Lex and Yacc I have to write only a very small amount of code. The two magic tools do the job by generating hundreds of lines of C code for me. Of course needless to say that developing something similar from the scratch would surely take months.

I’ve always wondered how come there are so many programming languages. But now I think I know the answer. It’s so freaking easy to develop parsers and compilers for a language with tools like Lex and Yacc. Once you have a language expressed in terms of a set of productions it takes only a few minutes to develop a fully functional compiler. So anyone can come up with an idea for a new language and turn it into reality in a flash. In fact I’m thinking about doing the same in the near future :-)

If you are like me and want to learn and master Lex and Yacc here is a cool reference guide. It helped me out a lot and I’m sure it will do the same for you.

1 comment:

rebellion said...

Sir, please guide me. I want to make a language for simplifying the data structure implementation.