Why is the dynamic selection of column and table names so difficult in SQL?

I believe there must be a specific design reason why you cannot write a query, for example the following:

select (select column_name from information_schema where column_name not like '%rate%' and table_name = 'Fixed_Income') from Fixed_Income 

and instead you have to access dynamic SQL.

Does anyone know what this reason is? I tried it, but all the hits were screaming for help in solving the problem, which means that this is a fairly common need and is not entirely clear.

+4
source share
2 answers

You ask a very interesting question.

“Relational” in “relational algebra” refers to name-value pairs, and not to relations between tables. In relational algebra, there is no requirement that all records in a set (table) have the same columns.

My best guess is that the restriction is related to the idea of ​​entity relationship diagrams. The database is for tables, and these tables are related to each other. The choice of a relational database for storing and accessing data was intentional when data could be stored in this way. Knowledge of entities and their attributes requires a static form of data and, therefore, static links in queries.

In addition, SQL as a language is a declarative language, not a procedural language. This assumes - but does not impose - a compilation stage separate from the execution of the request. In general, the SQL engine does the following (at a very high level):

  • Compiles a request, usually into some kind of data flow process.
  • Optimizes the process of data flow. (This is usually part of the compilation process.)
  • Runs the request.

The first two results are in what is called a "query plan." However, you really can’t optimize if you don’t know about the objects you work on. Thus, the dynamic selection of tables and columns means that optimization will be part of running the query, not compiling.

Finally, some databases, such as SQL Server, support dynamic SQL. This allows you to create strings that are collected and run at the same time. This is very useful for complex decision support requests. This is not recommended when you need fast transaction throughput because the compilation overhead is too large relative to the request.

+1
source

The reason is because the query optimizer needs to know the exact schema objects that you reference at compile time. They need to optimize the request. You would not believe how slow RDBMS was if this information were not available to the query optimizer.

This is a bit like the difference in performance between static and dynamic typing in practice: there is usually a non-trivial difference (I think of the main languages ​​here). The compiler can use static information to generate great code.

Even if this function were present, it would be implemented by first calculating the names of tables and columns, and then performing standard “static” query scheduling.

+6
source

Source: https://habr.com/ru/post/1416454/


All Articles