Parsing / tokenizing a string containing an SQL command

Are there open source libraries (any language preferred by python / PHP) that will tokenize / parse the ANSI SQL string into its various components?

That is, if I had the following line

SELECT a.foo, b.baz, a.bar FROM TABLE_A a LEFT JOIN TABLE_B b ON a.id = b.id WHERE baz = 'snafu'; 

I would return a data structure / object, something like

  //fake PHPish $results['select-columns'] = Array[a.foo,b.baz,a.bar]; $results['tables'] = Array[TABLE_A,TABLE_B]; $results['table-aliases'] = Array[a=>TABLE_A, b=>TABLE_B]; //etc... 

Again, I'm looking for code in a database package that teases the SQL command separately so that the engine knows what to do with it. A search on the Internet leads to great results on how to parse a WITH SQL string. This is not what I want.

I understand that I can break through the code of an open source database to find what I want, but I was hoping for something even more ready-made (although if you know where in MySQL, PostgreSQL, SQLite source to view, Feel free to transfer it together)

Thanks!

+6
python sql php tokenize parsing
source share
1 answer

There is a file in the SQLite source called parse.y that contains the grammar for SQL. You can pass this file to the lemon syntax generator to generate C code that does the grammar.

+2
source share

All Articles