Creating a DSL Query Language

I am working on a project (written in Django) that has only a few entities, but many lines for each object.

In my application, I have several static "reports" directly written in plain SQL. Users can also search the database through a common filter form. Since the target audience is really technological, and at some point the filter does not meet their needs, I’m thinking about creating a query language for my database, for example YQL or advanced search Jira .

I found http://sourceforge.net/projects/littletable/ and http://www.quicksort.co.uk/DeeDoc.html , but it seems that they only work with objects in memory. Since the database may be too large to hold in memory, I would prefer that the query be translated into SQL (or, better, a Django query) before doing the actual work.

Is there any library or best practices on how to do this?

+7
source share
4 answers

Writing such a DSL is actually surprisingly easy with PLY , and what a ho there is already an example for doing what you want in Django. You see, Django has this bizarre thing called the Q object , which makes the Django task very simple.

In DjangoCon EU 2012, Matthieu Amiguet gave a session entitled “Implementing Domain Languages ​​in Django Applications” in which he went through this process, right up to the introduction of the kind of DSL you need. His slides, which include everything you need, are available on his website. The final code (associated with the last slide, anyway) is available at http://www.matthieuamiguet.ch/media/misc/djangocon2012/resources/compiler.html .

Reinout van Rees also produced some good comments on this session . (He usually does this!) They cover a bit of the missing context.

In the above examples, you see something very similar to YQL and JQL:

  • groups__name="XXX" AND NOT groups__name="YYY"
  • (modified > 1/4/2011 OR NOT state__name="OK") AND groups__name="XXX"

It can also be easily changed; for example, you can use groups.name rather than groups__name (I would). This modification can be done quite trivially (enable . In the FIELD token by changing t_FIELD and then replacing . t_FIELD __ before building the Q object in p_expression_ID ).

So this satisfies a simple query; it also gives you a good starting point if you want to make a more sophisticated DSL.

+12
source

I faced exactly this problem - a large database that needs to be searched. I made some static reports and some trendy filters using django (very simple with django), just like you.

However, experienced users demanded more. I decided that there was already a DSL that they all knew - SQL . The question was how to make it safe enough.

So, I used django permissions to give authorized users permission to execute SQL queries in a new table. Then I pretended that users who did not have enough power were using these queries. I made them take extra parameters. The queries were executed using the lower level Python DB-API , which django uses under the hood for its ORM anyway.

The real trick was to open a read-only database connection, to run these queries to make sure there were no updates. I made a read-only connection by creating another user in the database with lower permissions and opening a specific connection for it.

TL; DR - SQL is the way to go!

+2
source

Depending on the form of your data, the types of queries your users should use, and the frequency of updating your data, an alternative to the clean SQL solution proposed by Nick Craig Wood is to index your data in Solr and then run queries against it.

Solr is an added level of complexity (configuration, data synchronization), but it is superfast, can process large amounts of data and provides a (relatively) intuitive query language.

+1
source

You can write your own SQL ish language using pyparsing . There is even a rather verbose example that you could expand on.

+1
source

All Articles