What is a structured way to build a MySQL query?

I consider myself competent enough in understanding and managing C-ish languages; it’s not a problem for me to come up with an algorithm and implement it in any C-ish language.

It is very difficult for me to write SQL (in my particular case, MySQL) queries. For very simple queries this is not a problem, but for complex queries I get upset not knowing where to start. Reading MySQL documentation is difficult, mainly because the description and explanation of the syntax is not very well organized.

For example, SELECT documentation is found throughout the map: it starts with what looks like psuedo-BNF, but then (because the text for describing aggregates is not select_expr ... for example select_expr ) quickly goes into this disappointing exercise, trying to combine this syntax by opening several browser windows.

Stop whining.

I would like to know how people start building a complex MySQL query step by step. Here is an example. I have three tables below. I want a SELECT rowset with the following characteristics:

In the userInfo and userProgram I want to select the userName , isApproved and modifiedTimestamp and UNION fields into one set. From this set, I want ORDER on modifiedTimestamp , accepting MAX(modifiedTimestamp) for each user (i.e. there should be only one line with a unique userName , and the timestamp associated with this user name should be as high as possible).

In the user table, I want to map firstName and lastName , which are associated with userName , so that it looks something like this:

 +-----------+----------+----------+-------------------+ | firstName | lastName | userName | modifiedTimestamp | +-----------+----------+----------+-------------------+ | JJ | Prof | jjprofUs | 1289914725 | | User | 2 | user2 | 1289914722 | | User | 1 | user1 | 1289914716 | | User | 3 | user3 | 1289914713 | | User | 4 | user4 | 1289914712 | | User | 5 | user5 | 1289914711 | +-----------+----------+----------+-------------------+ 

The next I received a request that looks like this:

 (SELECT firstName, lastName, user.userName, modifiedTimestamp FROM user, userInfo WHERE user.userName=userInfo.userName) UNION (SELECT firstName, lastName, user.userName, modifiedTimestamp FROM user, userProgram WHERE user.userName=userProgram.userName) ORDER BY modifiedTimestamp DESC; 

I feel like I'm pretty close, but I don’t know where to go from here or even if I think about it correctly.

 > user +--------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------------+--------------+------+-----+---------+-------+ | userName | char(8) | NO | PRI | NULL | | | firstName | varchar(255) | NO | | NULL | | | lastName | varchar(255) | NO | | NULL | | | email | varchar(255) | NO | UNI | NULL | | | avatar | varchar(255) | YES | | '' | | | password | varchar(255) | NO | | NULL | | | passwordHint | text | YES | | NULL | | | access | int(11) | NO | | 1 | | | lastLoginTimestamp | int(11) | NO | | -1 | | | isActive | tinyint(4) | NO | | 1 | | +--------------------+--------------+------+-----+---------+-------+ > userInfo +-------------------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+------------+------+-----+---------+-------+ | userName | char(8) | NO | MUL | NULL | | | isApproved | tinyint(4) | NO | | 0 | | | modifiedTimestamp | int(11) | NO | | NULL | | | field | char(255) | YES | | NULL | | | value | text | YES | | NULL | | +-------------------+------------+------+-----+---------+-------+ > userProgram +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | userName | char(8) | NO | PRI | NULL | | | isApproved | tinyint(4) | NO | PRI | 0 | | | modifiedTimestamp | int(11) | NO | | NULL | | | name | varchar(255) | YES | | NULL | | | address1 | varchar(255) | YES | | NULL | | | address2 | varchar(255) | YES | | NULL | | | city | varchar(50) | YES | | NULL | | | state | char(2) | YES | MUL | NULL | | | zip | char(10) | YES | | NULL | | | phone | varchar(25) | YES | | NULL | | | fax | varchar(25) | YES | | NULL | | | ehsChildren | int(11) | YES | | NULL | | | hsChildren | int(11) | YES | | NULL | | | siteCount | int(11) | YES | | NULL | | | staffCount | int(11) | YES | | NULL | | | grantee | varchar(255) | YES | | NULL | | | programType | varchar(255) | YES | | NULL | | | additional | text | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 
+7
sql mysql
source share
5 answers

What I understand from your question, you seem to need a correlated query that looks like this:

 (SELECT firstName, lastName, user.userName, modifiedTimestamp FROM user, userInfo ui1 WHERE user.userName=userInfo.userName AND modifiedtimestamp=(select max(modifiedtimestamp) from userInfo ui2 where ui1.userName=ui2.userName)) UNION (SELECT firstName, lastName, user.userName, modifiedTimestamp FROM user, userProgram up1 WHERE user.userName=userProgram.userName AND modifiedtimestamp=(select max(modifiedtimestamp) from userProgram up2 where up1.userName=up2.userName)) ORDER BY modifiedTimestamp DESC; 

So, am I continuing to move on to this result? Key: To clearly express the information you want to receive, without accepting mental abbreviations.

Step 1. Select the fields that I need in the different tables of my database. This is what is between SELECT and FROM. It seems obvious, but it becomes less obvious when it comes to aggregation functions, such as amounts or bills. In this case, you should say, for example: "I need to count the lines in userInfo for each firstName." See below under GROUP BY.

Step 2: Knowing the desired field, write the connections between the various corresponding tables. It is easy ...

Step 3: Express your terms. It can be easy, for example, if you need user data for userName = "RZEZDFGBH" or more complex, as in your case: a way to formulate it so that you can do it if you want only the most recently changed label, "so that the changed label was equal to the most recently changed label "(which is where you can easily take the mental label and skip the point)

Step 4: If you have aggregates, it's time to install the GROUP BY statement. For example, if you count the entire line in userInfo for each firstName, you should write "GROUP BY firstName":

 SELECT firstName,count(*) FROM userInfo GROUP BY firstName 

This gives you the number of entries in the table for each other firstName.

Step 5: Terms. These are the conditions for the units. In the previous example, if you only need data for firstName having more than 5 rows in the table, you can write SELECT firstName,count(*) FROM userInfo GROUP BY firstName HAVING count(*)>5

Step 6: Sort with ORDER BY. Pretty easy ...

This is just a brief summary. There is much, much more to discover, but it would be too long to write an entire SQL course here ... Hope this helps, though!

+1
source share

As f00 says, it's just (r) if you think about data in terms of sets.

One of the problems facing the question is that the expected result does not meet the specified requirements - the isApproved column is mentioned in the description, but this is not found anywhere in the request or in the expected output.

What this illustrates is that the first step in writing a query is to have a clear idea of what you want to achieve. The big problem with this question is that it is not clearly described - instead, it is moved from the sample table of the expected result (which would be more useful if we had the appropriate samples of the expected input) directly into the description of how you intend to reach.

As I understand it, what you want to see is a list of users (by user name, with their associated names and surnames), as well as the last time that any related record was changed to any userInfo or userProgram table.

(It is unclear whether you want to see users who have no related activity in any of these other tables - your query does not imply, otherwise the joins will be external joins.)

So, you need a list of users (by username, with their first and last names):

 SELECT firstName, lastName, userName FROM user 

along with a list of the time the record was last modified:

 SELECT userName, MAX(modifiedTimestamp) 

...

in userInfo or userProgram tables:

...

 FROM (SELECT userName, modifiedTimestamp FROM userInfo UNION ALL SELECT userName, modifiedTimestamp FROM userProgram ) subquery -- <- this is an alias 

...

by username:

...

 group by userName 

These two data sets must be linked by their username, so the final query will look like this:

 SELECT user.firstName, user.lastName, user.userName, MAX(subquery.modifiedTimestamp) last_modifiedTimestamp FROM user JOIN (SELECT userName, modifiedTimestamp FROM userInfo UNION ALL SELECT userName, modifiedTimestamp FROM userProgram ) subquery ON user.userName = subquery.userName GROUP BY user.userName 

In most versions of SQL, this query returns an error because user.firstName and user.lastName not included in the GROUP BY and they are not summarized. MySQL allows this syntax in other SQLs, since these fields are functionally dependent on the username, adding MAX before each field or adding them to the grouping will achieve the same result.

A few additional points:

  • UNION and UNION ALL are not identical - the first removes duplicates, and the second does not; this makes the first processor more intense. Since duplicates will be deleted by grouping, it is better to use UNION ALL.
  • Many people will write this query when a user joins userInfo UNIONed ALL with a user connected to userProgram - this is due to the fact that many SQL modules can more effectively optimize this type of query. At the moment, this represents a premature optimization.
+1
source share

There is a lot of good here. Thanks to everyone who contributed. This is a summary of what I found useful, as well as some additional thoughts on connecting building functions to query building. I wish I could give everyone the CO badges / points, but I think there can only be one (answer), so I choose Trarot based on the total number of points and personal assistance.

Function can be understood as three parts: input, process, output. A similar query can be understood. Most queries look something like this:

 SELECT stuff FROM data WHERE data is like something 
  • The SELECT part is the output. There are some possibilities for formatting the output (i.e. using AS )

  • The FROM part is the input. Input should be considered a data pool; You will want to make this as specific as possible using various suitable joins and subqueries.

  • The WHERE part is similar to the process, but there is a lot overlapping with the FROM part. Both parts of FROM and WHERE can minimize the data pool by using various conditions to filter out unwanted data (or to include only the desired data). The WHERE part can also help format the output.

This is how I broke the steps:

  • Start by thinking about what your conclusion looks like. This material is part of the SELECT .

  • Next, you want to define the data set that you want to work on. Remarks: “Knowing the field you need, write the connections between the various corresponding tables. It’s easy ...” It depends on what you mean by “easy.” If you are new to writing queries, you probably just wrote internal joins by default (like me). This is not always the best way. http://en.wikipedia.org/wiki/Join_(SQL ) is an excellent resource for understanding possible possible associations.

  • As part of the previous step, think about the small parts of this dataset and create a complete dataset that interests you. When writing a function, you can write subfunctions to help express your process in a cleaner way. Similarly, you can write subqueries. Great advice from Mark Bannister on creating a subquery AND USING ALIAZ. You will need to reconfigure your output to use this alias, but this is a pretty important point.

  • Finally, you can use various methods to collect data by deleting data that you are not interested in.

One way to think about the data you are working on is with a giant two-dimensional matrix: JOIN to make a larger horizontal aspect, UNION increase the vertical aspect. All other filters are designed to make this matrix more suitable for your output. I don’t know if there is a “functional” analogy with JOIN , but UNION just adds the output of the two functions together.

I understand, however, there are many ways in which building a query does NOT like to write a function. For example, you can create and fend off a dataset in the FROM and WHERE areas. What was key to me was understanding the joins and figuring out how to create subqueries using aliases.

+1
source share

just learn to think in terms of sets - then it's simple: P

http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

0
source share

You cannot build sql without understanding the data in the tables and the required logical result. There is no background for the data that the tables may look like and their values, and the description of the results you are trying to collect does not make sense to me, so I'm not going to take any chances.

In the last paragraph ... it is rare that you want multiple sources to combine timestamp values. Generally speaking, when such results are collected, they are usually used for some kind of audit / tracking. However, when you discard all information about the timestamp source and simply calculate the maximum you have ... well, what exactly?

In any case, one or more examples of data and the desired result, and possibly something about the application and whys, need to be done so that you understand.

As far as I will make any predictions about the form of your final operator (if your task will still receive the maximum timestamp for each user), it will look something like this:

 select u.firstname, u.lastname, user_max_time.userName, user_max_time.max_time from users u, ( select (sometable).userName, max((sometable).(timestamp column)) from (data of interest) group by (sometable).userName) user_max_time where u.userName = user_max_time.userName order by max_time desc; 

Now your task would be to replace () s in the user_max_time subquery with one that makes sense and meets your requirements. As for the general approach to complex sql, the main suggestion is to pull the query from the innermost subqueries back (testing along the way to make sure the performance is ok and you don't need the intermediate tables).

In any case, if you have any problems and can return with examples, please help with pleasure.

Cheers, Ben

0
source share

All Articles