Reusing MySQL parser

I am working on an SQL Intrusion Detection System (IDS) and I need to parse incoming SQL queries. Writing your own SQL parser is a long-term task, and it will never accurately reflect the logic used in your own parser. I found out that MySQL has a lexical analyzer with the main sql/sql_lex.cc source file and a parser built with a bison from sql/sql_yacc.y . I am really interested in reusing these reliable solutions. I am building my IDS in C / C ++, so I am looking for a way to associate the MySQL parser with my discovery system.

I was wondering if it is possible to reuse the MySQL parser (lexical + parser) to get the structure of the SQL query in some logical form, for example. syntax tree. It would be possible? Is there any related text, tutorials or projects?

thanks

+6
source share
2 answers

I completed the first version of my IDS as part of my bachelor's project. It is implemented as a plugin for MySQL.

I will talk about my main sources to understand the insides of MySQL below. Then I will briefly describe the approach that I used in my IDS.

MySQL documentation texts

  • I found the books MySQL Expert Charles Bell and Understanding the Internal Languages โ€‹โ€‹of MySQL by Sasha Pachev (as written by user 3822447) is a very good entry point for understanding the internal functions of MySQL.
  • The development of MySQL 5.1 Plugin by Andrew Hutchings and Sergey Golubchik is also very useful.
  • The MySQL internal docs guide also provides some basic information to get you started.
  • After all the readings, I debugged som (using VS) and discovered what the structure of the query tree looks like.

My solution for IDS

The source code for my solution can be found in sourceforge . I plan to document it a bit more on my wiki.

The main entry point is the audit_ids_notify() function in audit_ids.cc . The plugin accepts the query tree generated by the MySQL internal parser, makes a simplified version (to save memory). Then it performs anomalous discovery - it has a list of known query tree structures and stores some statistical information about each parameterizable part of each query tree structure. The output is written to a special log file in the MySQL data directory.

I tried to make the solution modular and extensible. The initial version is a kind of demonstration, and performance is not optimized, especially in the SQL storage module.

MySQL plugin type

I identified two possible approaches and used the first.

  • audit plugin
    • The type of shell in my plugin is making an audit plugin .
    • I used this type of plug-in, despite the fact that it was used to report server actions (for example, to record requests or errors).
    • I chose this type of plugin because I found that it is the only native supported plugin that is called when the query tree after completion (i.e., is parsed) and before it is freed from memory (for MySQL 5.6.17).
    • Disadvantage : the above is not fully guaranteed in future versions of MySQL, but, in my opinion, this should not change in the near future.
    • Advantage : MySQL does not need to be recompiled. It is enough to build and install the plugin.


  1. query-rewrite plugin
    • There is also an alternative approach using this method using a non-local query-rewrite plugin type. He proved the plugin API to modify the request, as well as to read it.
    • Disadvantage . To support this plugin API, the MySQL server must be recompiled using the API. I think it could become part of the MySQL distribution.
    • Advantage : a type of plugin designed to read / write the internal query tree.

If there are any questions / issues related to this topic, I can answer, feel free to ask;)

+8
source

I think this is possible. Try an extended MySQL internal book, such as Charles Bell's MySQL Expert or Sasha Pachev's Understanding MySQL Internal Languages. MySQL uses a custom manual lexer and a common Bison-compatible parser, with which their lexer is compatible.

In addition, you can find a simpler solution than parsing the request, for example:

  • Strategy # 1: Drop the query and just look at the contents of the rows inside the query. Look for possible attacks, such as SQL keywords. This can detect attacks.
  • Strategy # 2: Drop all user input and make a list of the rest of the content of the request. Make a list of all keyword query templates and compare them with each other. Find queries with an anomalous structure that indicate that someone has successfully modified the query.

I am not an SQL guru, but the simplest strategy is to simply use parameterized queries and ignore penetration attempts. Most of these online attempts are generic, random requests designed to identify obvious weaknesses, and they can be safely ignored if you follow basic security practices everywhere.

+1
source

All Articles