In the inner join, how to select only one row from the table on the right, based on sliding criteria?

I have a set of tables containing weeks, products, inventory and weekly forecasts, from which I want to choose an inventory of goods in week X and the latest forecasts. But I just can't figure out SQL:

create table products ( product_id integer ); create table inventory ( product_id integer, asof_week integer, qoh float8 ); create table forecast ( product_id integer, for_week integer, asof_week integer, projection float8 ); create table weeks ( wkno integer ); insert into weeks values (4),(5),(6),(7); insert into products values(1),(2); insert into inventory values(1,5,10),(1,6,20),(2,6,200); insert into forecast values(1,4,1,10),(1,4,2,11),(1,4,3,12),(1,4,4,13), (1,5,1,11),(1,5,2,11),(1,5,3,21),(1,5,4,31), --corr:one too many (1,6,1,10),(1,6,2,11),(1,6,3,12),(1,6,4,22),(1,6,5,32),(1,6,5,42),(1,6,6,42), (1,6,1,10),(1,6,2,11),(1,6,3,12),(1,6,4,22),(1,6,5,42),(1,6,6,42), (1,7,1,10),(1,7,6,16), (2,6,5,2000),(2,7,5,2100),(2,8,5,30); 

And request:

 select p.product_id "product", i.asof_week "inven asof", i.qoh "qoh", f.for_week "fcast for", f.projection "fcast qty", f.asof_week "fcast asof" from weeks w, products p left join inventory i on(p.product_id = i.product_id) left join forecast f on(p.product_id = f.product_id) where (i.asof_week is null or i.asof_week = w.wkno) and (f.for_week is null or f.for_week = w.wkno) and (f.asof_week is null or f.asof_week = (select max(f2.asof_week) from forecast f2 where f2.product_id = f.product_id and f2.for_week = f.for_week)) order by p.product_id, i.asof_week, f.for_week, f.asof_week 

For example, for weeks 4-7, I am looking for a set of results:

 product week qoh projection 1 4 - 13 1 5 10 31 1 6 20 42 1 7 - 16 2 6 200 2000 2 7 - 2100 

BUT in reality I get only 3 lines:

  product | inven asof | qoh | fcast for | fcast qty | fcast asof ---------+------------+-----+-----------+-----------+------------ 1 | 5 | 10 | 5 | 31 | 4 1 | 6 | 20 | 6 | 42 | 6 2 | 6 | 200 | 6 | 2000 | 5 (3 rows) Time: 2.531 ms 

I am new to SQL and can use some useful pointers.

Some data notes: I have several other data tables that I illustrated in this example to focus on this problem, at least one of them is similar in nature to a table of predicted values ​​(i.e. with several version lines for each product x week). There are about 100 forecast lines for each product X week, so somewhere I also have to worry about efficiency ... but first correct the results.

I'm on postgresql 9.2.

Thanks.

+4
source share
3 answers

It is difficult to give general pointers without knowing the rest of your data model, but I have to say the following: I usually find that queries are easier to reason when I keep them as flat as possible. In addition, as soon as I have a bunch of null checks, I either try to add guarantees to my data, or re-draw my query around another "root" table.

In any case, the following should work for you (although I cannot guarantee that it will work for any data, especially if there are duplicates):

 select products.product_id, weeks.wkno, inventory.qoh, max(projection) from forecast join products on products.product_id = forecast.product_id join weeks on weeks.wkno = forecast.for_week left join inventory on inventory.product_id = products.product_id and inventory.asof_week = weeks.wkno group by products.product_id, weeks.wkno, inventory.qoh 

Sorry, I can’t give you so much advice. Hope this helps.

Change Modify the request to remove the cross-connection. The original version is here . You may need to cross-connect if you want to leave consolidation forecasts if some of them are missing. For your specific example, this is unnecessary.

Change 2 . The above query is semantically incorrect. below is correct but not an illustration of my point.

 select p.product_id, p.wkno, p.qoh, f.projection from (select products.product_id, weeks.wkno, inventory.qoh, max(forecast.asof_week) max_p from forecast join products on products.product_id = forecast.product_id join weeks on weeks.wkno = forecast.for_week left join inventory on inventory.product_id = products.product_id and inventory.asof_week = weeks.wkno group by products.product_id, weeks.wkno, inventory.qoh) as p join forecast f on f.product_id = p.product_id and f.for_week = p.wkno and f.asof_week = p.max_p 
+2
source

There are apparently some PK / FK limitations in the data:

 CREATE TABLE products ( product_id INTEGER PRIMARY KEY ); CREATE TABLE weeks ( wkno INTEGER PRIMARY KEY ); CREATE TABLE inventory ( product_id INTEGER REFERENCES products(product_id) , asof_week INTEGER REFERENCES weeks(wkno) , qoh float8 , PRIMARY KEY (product_id,asof_week) ); CREATE TABLE forecast ( product_id INTEGER REFERENCES products(product_id) , for_week INTEGER REFERENCES weeks(wkno) , asof_week INTEGER REFERENCES weeks(wkno) , projection FLOAT8 , PRIMARY KEY (product_id,for_week,asof_week) ); INSERT INTO weeks VALUES (4),(5),(6),(7) , (1),(2),(3), (8) -- need these, too ; -- et cetera. 

If the weeks table is intended as a calendar table, it can (and should ) be replaced with the generate_series(4,7) pseudo-table generate_series(4,7) . (and FC restrictions have ended)

The query is severely affected by the LEFT JOIN + MAX (aggregate) construct. The following should do the same and look simpler ( NOT EXISTS to save ...):

 SELECT p.product_id "product" , i.asof_week "inven asof" , i.qoh "qoh" , f.for_week "fcast for" , f.projection "fcast qty" , f.asof_week "fcast asof" FROM products p CROSS JOIN weeks w LEFT JOIN inventory i ON i.product_id = p.product_id AND i.asof_week = w.wkno LEFT JOIN forecast f ON f.product_id = p.product_id AND f.for_week = w.wkno WHERE NOT EXISTS ( SELECT * FROM forecast f2 WHERE f2.product_id = f.product_id AND f2.for_week = f.for_week AND f2.asof_week < f.asof_week ) AND COALESCE(i.asof_week,f.for_week) IS NOT NULL ORDER BY p.product_id, i.asof_week, f.for_week, f.asof_week ; 
+1
source

Thanks to Julien for the tip. This gives a result, although I'm not sure if this is the best approach or how it will work if I have more than 100 million lines, since I am still working with sets of toys. Probably the first bad thing pw not listed below.

 with pw as ( select * from products, weeks ) select pw.product_id "product", pw.wkno, i.asof_week "inven asof", coalesce(i.qoh::text,'missing') "qoh", f.for_week "fcast for", coalesce(f.projection::text,'no fcast') "fcast qty", f.asof_week "fcast asof" from pw left join inventory i on(pw.product_id = i.product_id and pw.wkno = i.asof_week ) left join forecast f on(pw.product_id = f.product_id and f.for_week = pw.wkno and f.asof_week = (select max(f2.asof_week) from forecast f2 where f2.product_id = pw.product_id and f2.asof_week < pw.wkno and f2.for_week = pw.wkno)) where not (i.asof_week is null and f.asof_week is null) order by pw.product_id, pw.wkno, f.for_week, f.asof_week 

what gives

  product | wkno | inven asof | qoh | fcast for | fcast qty | fcast asof ---------+------+------------+---------+-----------+-----------+------------ 1 | 4 | | missing | 4 | 12 | 3 1 | 5 | 5 | 10 | 5 | 31 | 4 1 | 6 | 6 | 20 | 6 | 42 | 5 1 | 7 | | missing | 7 | 16 | 6 2 | 6 | 6 | 200 | 6 | 2000 | 5 2 | 7 | | missing | 7 | 2100 | 5 (6 rows) Time: 2.999 ms 
0
source

All Articles