Is there any rule of thumb for creating an SQL query from a human-readable description?

Whenever there is any description of the request in front of us, we try to use heuristics and brainstorming to build the request.

Is there a systematic phased or mathematical way to build an SQL query from a given human-readable description?

For example, how to determine if an SQL query is required for a connection, and not for a subquery, will a group be required, regardless of whether this condition requires an IN, etc ...

For example, whoever studied Digital Electronics, he would know about methods such as the Karnaugh Map or Quin McClausky. These are some systematic approaches to simplifying digital logic.

Should there be any method like these for manually analyzing sql queries to avoid brainstorming every time?

+3
sql database relational-database human-readable heuristics
source share
2 answers

Is there any systematic step-by-step or mathematical way to build an SQL query from a given description that is understandable to humans?

Yes there is.

It turns out that natural language expressions and logical expressions, as well as expressions of relational algebra and SQL expressions (a hybrid of the latter two) correspond in a rather direct way. (What follows is not for duplicate strings and zeros.)

A related predicate is associated with each table (database or query result) - an empty-fill operator (named-) template parameterized by column names.

[liker] likes [liked] 

The table contains each row, which, using column row values ​​to fill in (named) spaces, makes a true statement, known as a sentence.

 liker | liked -------------- Bob | Dex /* Bob likes Dex */ Bob | Alice /* Bob likes Alice */ Alice | Carol /* Alice likes Carol */ 

Each sentence from filling a predicate with values ​​from a row in a table is true. And every sentence from filling the predicate with values ​​from a row that is not in the table is false.

 /* Alice likes Carol AND NOT Alice likes Alice AND NOT Alice likes Bob AND NOT Alice likes Dex AND NOT Alice likes Ed ... AND Bob likes Alice AND Bob likes Dex AND NOT Bob likes Bob AND NOT Bob likes Carol AND NOT Bob likes Ed ... AND NOT Carol likes Alice ... AND NOT Dex likes Alice ... AND NOT Ed likes Alice ... */ 

DBA gives a predicate for each base table. The SQL syntax for declaring a table is much like the traditional logical shorthand for a natural language version of this predicate.

 /* (person, liked) rows where [liker] likes [liked] */ /* (person, liked) rows where Likes(liker, liked) */ SELECT * FROM Likes 

The expression (sub) of the SQL query converts the values ​​of the argument table into the new value of the table containing the rows that make up the true statement from the new predicate. The new table predicate can be expressed in terms of the predicate (s) of the argument table according to the relational / table expression operators (sub) expressions. A query is an SQL expression whose predicate is the predicate of the row table we want.

Inside SELECT :
• A base table named T with an alias A has a predicate / row, where T(AC,...) .
R CROSS JOIN S & R INNER JOIN S have a predicate / are strings, where the predicate of R AND the predicate of S (Strings that are a combination of a string from each argument with an alias A after renaming its columns C,... to AC,... )
R ON condition R WHERE condition have a predicate / are strings in which the predicate of R AND condition .
SELECT DISTINCT AC AS D,... FROM R (possibly with implicit A. and / or implicit AS D ) has predicates / rows in which FOR SOME [value for] then discards the columns and then the predicate of R with AC,... replaced by D,... (Deleted columns are not parameters of the new predicate.)
• Equivalent to SELECT DISTINCT AC AS D,... FROM R has a predicate / are strings in which FOR SOME A.*,..., AC=D AND... AND the predicate of R (This may be less compact, but more like SQL.)
(X,...) IN (R) means predicate of R with columns C,... replaced by X,...
• Therefore, (...) IN (SELECT * FROM T) means T(...) .

The natural language and abbreviation for the lines (the person you like), where [the person] is Bob, and Bob likes the one who likes [liked] but who doesn't like Ed.

 /* (person, liked) rows where for some value for x, [person] likes [x] and [x] likes [liked] and [person] = 'Bob' and not [x] likes 'Ed' /* (person, liked) rows where FOR SOME [value for] x, Likes(person, x) AND Likes(x, liked) AND person = 'Bob' AND NOT Likes(x, 'Ed') */ 

Rewrite using the predicates of our base tables and then SQL.

 /* (person, liked) rows where FOR SOME [values for] l1.*, l2.*, person = l1.liker AND liked = l2.liked AND Likes(l1.liker, l1.liked) AND Likes(l2.liker, l2.liked) AND l1.liked = l2.liker AND person = 'Bob' AND NOT Likes(l1.liked, 'Ed') */ SELECT l1.liker AS person, l2.liked AS liked FROM /* (l1.liker, l1.liked, l2.liker, l2.liked) rows where Likes(l1.liker, l1.liked) AND Likes(l2.liker, l2.liked) AND l1.liked = l2.liker AND l1.liker = 'Bob' AND NOT Likes(l1.liked, 'Ed') */ Likes l1 INNER JOIN Likes l2 ON l1.liked = l2.liker WHERE l1.liker = 'Bob' AND NOT (l1.liked, 'Ed') IN (SELECT * FROM Likes) 

R UNION CORRESPONDING S has a predicate / are strings in which the predicate of R OR the predicate of S
R EXCEPT S has a predicate / are strings in which the predicate of R AND NOT the predicate of S
VALUES(C,...)((X,...),...) has a predicate of / - lines, where (C = X AND...) OR...

 /* (person) rows where (FOR SOME liked, Likes(person, liked)) OR person = 'Bob' */ SELECT liker AS person FROM Likes UNION VALUES (person) (('Bob')) 

Thus, if we express our desired rows in terms of the given operator patterns in the natural language of the base table, for which the rows are true or false (to return or not), then we can convert to SQL queries, which are embeddings of logical abbreviations and operators and / or table names. & operators. And then the DBMS can completely convert to tables to calculate the rows that make our predicate true.

See How to get matching data from another SQL table for two different columns: Internal join and / or Join? reapply this to SQL. (Another self-join.)
See Relational Algebra for Banking Scenarios for more information on natural language formulations. (In the context of relational algebra.)

+3
source share

Here is what I do in non-group queries:

I put in a FROM in the table of which I expect to get zero or one output row in a row in the table. Often you want something like "all customers with specific properties." Then the customer table goes into the FROM .

Use joins to add columns and filter rows. Joins should not duplicate rows. The join should find zero or one row, never again. This makes it very intuitive because you can say that "join adds columns and filters some rows."

Subqueries should be avoided if joining can replace them. Connections look better, more general, and often more efficient (due to weak query optimizer errors).

How to use WHERE and projection is easy.

+1
source share

All Articles