I am trying to write a query with two subqueries. Queries work when they are run individually, but when I combine them, I don’t get the desired result set. I will try to give a minimal example.
Basic request:
mysql> select target_name_id, ep, count(*), count(distinct wafer_id) from data_cst where target_name_id = 155609 and data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' GROUP BY target_name_id, data_cst.ep; +----------------+------+----------+--------------------------+ | target_name_id | ep | count(*) | count(distinct wafer_id) | +----------------+------+----------+--------------------------+ | 155609 | Line | 4799 | 215 | +----------------+------+----------+--------------------------+ 1 row in set (0.05 sec)
First subquery:
mysql> SELECT target_name_id,ep, wafer_id, AVG(bottom) as averages, FROM data_cst WHERE target_name_id = 155609 AND data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' GROUP BY target_name_id, ep, wafer_id; +----------------+------+----------+------------+ | target_name_id | ep | wafer_id | averages | +----------------+------+----------+------------+ | 155609 | Line | 401739 | 47.6236667 | | 155609 | Line | 403041 | 47.3739167 | | 155609 | Line | 408339 | 47.4901667 | | 155609 | Line | 409683 | 48.3066250 | | 155609 | Line | 409690 | 47.2402500 | | 155609 | Line | 410249 | 47.3346667 | | 155609 | Line | 410633 | 48.7373333 | | 155609 | Line | 414000 | 48.1274167 | . . . 215 rows in set (0.07 sec)
Second subquery:
mysql> SELECT target_name_id, ep, data_file_id, lot_id, wafer_id, date_time, COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id)-1 as reruns FROM data_cst WHERE target_name_id = 155609 AND data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' GROUP BY target_name_id, data_cst.ep, wafer_id HAVING COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id) > 1; +----------------+------+--------------+--------+----------+---------------------+--------+ | target_name_id | ep | data_file_id | lot_id | wafer_id | date_time | reruns | +----------------+------+--------------+--------+----------+---------------------+--------+ | 155609 | Line | 774 | 120804 | 403041 | 2012-07-06 03:51:50 | 1 | | 155609 | Line | 6502 | 123109 | 409683 | 2012-07-16 05:10:04 | 1 | | 155609 | Line | 749 | 120804 | 409690 | 2012-07-06 04:08:01 | 1 | | 155609 | Line | 3319148 | 123484 | 410633 | 2012-07-07 09:12:20 | 5 | | 155609 | Line | 8264 | 134609 | 414098 | 2012-07-03 11:34:12 | 5 | | 155609 | Line | 3279867 | 124752 | 414245 | 2012-06-26 00:51:31 | 1 . . . 93 rows in set (0.06 sec)
Now that I put them all together, I want the calculations from the main query to be average of the second, and the sum of the repetition columns from the third. I have been doing this for 3 days, and I can’t come up with the right combination to get the results I want. I was able to get the amount to get it right, or the calculations, or the average, but not all 3. Here is my last attempt:
mysql> select data_cst.target_name_id, data_cst.ep, count(*) as count, count(distinct data_cst.wafer_id) as wafers, avg(averages) as average, sum(reruns) as rerun from data_cst, (SELECT target_name_id,ep, wafer_id, AVG(bottom) as averages FROM data_cst WHERE target_name_id = 155609 AND data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' GROUP BY target_name_id, ep, wafer_id) q1, (SELECT target_name_id, ep, data_file_id, lot_id, wafer_id, date_time, COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id)-1 as reruns FROM data_cst WHERE target_name_id = 155609 AND data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' GROUP BY target_name_id, data_cst.ep, wafer_id HAVING COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id) > 1) r where data_cst.target_name_id = 155609 AND data_cst.date_time BETWEEN '2012-06-23 00:00:00' AND '2012-08-23 23:59:59' and data_cst.wafer_id = q1.wafer_id GROUP BY target_name_id, data_cst.ep; +----------------+------+--------+--------+----------------+--------+ | target_name_id | ep | count | wafers | average | rerun | +----------------+------+--------+--------+----------------+--------+ | 155609 | Line | 446307 | 215 | 48.12877962148 | 724649 | +----------------+------+--------+--------+----------------+--------+ 1 row in set (23.56 sec)
With this outward majority of cases where, the number of waffles and the average value are correct, but the count and repeat are not. I can use another where clause and get the correct reconfiguration, but then the count and waffles are wrong. I can use another different where clause and count the score correctly, but repeating it again is wrong.
I have been doing this for 3 days and I just can't find the where clause that works for me.
This is an update to my question:
I changed my query suggested by Gordon Linoff, and since then the client has added many new requirements that I could include in the query. But now they have added some, I can’t understand how this works.
Now my request is as follows:
SELECT data_target.name as Target, q1.ep as EP, COUNT(*) as Wafers, Lots, SUM(numonep)/(COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END)) as 'Sites/Wafer', MAX(LastRun) as "Last Run", SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Rerun, COUNT(*)+SUM(CASE WHEN reruns > 0 THEN reruns ELSE 0 END) as Runs, avgbottom as "Avg Bottom", 3*stdbottom as "3 Sig", maxbottom as Max, minbottom as Min, SUM(numonep) as Count, SUM(numonep) - SUM(numbottoms) as NAs, 100-((SUM(numonep) - SUM(numbottoms))/SUM(numonep)*100) as "% Success", 3*stdbottom/avgbottom as "3Sig/Avg", AVG(avgbottom) as 'Wafer Avg', AVG(Wafer3Sigma) as 'Wafer 3 Sigma', AVG(Ranges) as 'Avg Range', 3*STD(Ranges) as '3Sig of Ranges', MAX(Ranges) as 'Max Range', MIN(Ranges) as 'Min Range', (SUM(numonep) - SUM(numbottoms))/COUNT(*) as 'NAs/Wafer' FROM (SELECT target_name_id, ep, wafer_id, COUNT(bottom) as numbottoms, AVG(bottom) as avgbottom, STD(bottom) as stdbottom, MAX(bottom) as maxbottom, MIN(bottom) as minbottom, MAX(date_time) as "LastRun", COUNT(*) as numonep, COUNT(DISTINCT target_name_id, ep, lot_id, data_file_id)-1 as reruns, COUNT(DISTINCT(lot_id)) as Lots, 3*STD(bottom) as Wafer3Sigma, MAX(bottom) - MIN(bottom) as Ranges FROM data_cst WHERE target_name_id IN (775, 776, 777, 778, 779, 780, 45, 44, 116, 117, 118, 119, 120, 121) AND data_cst.date_time BETWEEN '2010-03-04 00:00:00' AND '2010-03-04 23:59:59' GROUP BY target_name_id, ep, wafer_id HAVING count(*) < 999) q1, data_target WHERE data_target.id = target_name_id GROUP BY q1.target_name_id, q1.ep;
And it works great. But now they want me to get a specific column (image_measurer_id) with each row returned, which corresponds to a row from the group that has the bottom = Min (bottom), bottom = Max (bottom), the bottom is closer to the middle (bottom) and bottom of the line where date_time = Max (date_time).
Is this possible from this one query?