SQL: comparing two tables for missing records and then by date fields

I have two tables below

work_assignments

emp_id | start_date | End Date ------------------------------------------ 1 | May-10-2017 | May-30-2017 1 | Jun-05-2017 | null 2 | May-08-2017 | null 

hourly_pay

 emp_id | start_date | End Date | Rate ----------------------------------------------- 1 | May-20-2017 | Jun-30-2017 | 75 1 | Jul-01-2017 | null | 80 

These 2 tables share the foreign key emp_id (employee id) and join the two, I should be able to:

  • Find employee records in the hourly_pay table. Given the data here, the query should return emp_id 2 from the work_assignments table
  • find the entries where hourly_pay start_date which are later than start_date work orders. Again, given the data here, the request should return emp_id 1 (because work_assignments.start_date is May-10-2017, and the earliest hourly_pay.start_date is May-20-2017).

I can achieve the first part of the result using the connection request below

 select distinct emp_id from work_contracts left join hourly_pay hr USING(emp_id) where hr.emp_id is null 

Iโ€™m stuck in the second part, where I probably need a correlated subquery to tell the hourly pay table entries that didnโ€™t start before work_assignments start_date? or is there any other way?

+7
sql postgresql
source share
12 answers

This points to the between condition, with some twists, but I was very unlucky using betweens in connections. It seems like they do some form of cross-connect at the back and end, and then filter out the actual join where-clause style. I know that this is not very technical, but I never made the condition for equality in the connection, which turned out well.

Thus, this may seem counterintuitive, but I think hacking all the features today might be your best bet. Not knowing how big your date ranges are, it's really hard to say.

In addition, I think that this will really satisfy both conditions in your question right away - by informing you of all work tasks that do not have corresponding payment rates.

Try this against your actual data and see how it works (and how long it takes).

 with pay_dates as ( select emp_id, rate, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd from hourly_pay ), assignment_dates as ( select emp_id, start_date, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd from work_assignments ) select emp_id, min (wd)::date as from_date, max (wd)::date as thru_date from assignment_dates a where not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd ) group by emp_id, start_date 

The result should be all ranges of work orders without bids:

 emp from thru 1 '2017-05-10' '2017-05-19' 2 '2017-05-08' '2017-11-14' 

It is nice that this will also eliminate any overlap where the work task was partially covered.

- Edit 3/20/2018 -

At your request, here is a breakdown of what logic does.

 with pay_dates as( select emp_id, rate, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as pd from hourly_pay ) 

This takes hourly_pay data and breaks it into a record for each employee for each day:

 emp_id rate pay date 1 75 5/20/17 1 75 5/21/17 1 75 5/22/17 ... 1 75 6/30/17 1 80 6/01/17 1 80 6/02/17 ... 1 80 today 

Further

 [implied "with"] assignment_dates as ( select emp_id, start_date, generate_series (start_date, coalesce (end_date, current_date), interval '1 day') as wd from work_assignments ) 

Effectively does the same for the work assignment table, storing only the โ€œstart date columnโ€ on each row.

Then the main request is as follows:

 select emp_id, min (wd)::date as from_date, max (wd)::date as thru_date from assignment_dates a where not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd ) group by emp_id, start_date 

Which of the two above queries. The important part is the anti-compound:

 not exists ( select null from pay_dates p where p.emp_id = a.emp_id and a.wd = p.pd ) 

This defines each work task if there is no corresponding entry for this employee for this day.

Thus, in essence, the query accepts data ranges from both tables, creates every possible combination of dates, and then performs anti-join to see where they do not match.

Although it seems illogical to take one record and blow it up into several records, two things need to be considered:

  • Dates are very limited creatures โ€” even over 10 years, the value of data, which is only 4,000 or so records, is of little concern to the database, even when it is multiplied by the employeeโ€™s database. Your time frame looks much smaller.

  • I had very, VERY bad luck using unions other than =, for example between or > . It seems that in the background he makes Cartesians, and then filters the results. For comparison, takeoff ranges, at least, give you some control over how the data explosion occurs.

For smiles, I did this with your sample data above and came up with this that actually looks accurate:

 1 '2017-05-10' '2017-05-19' 2 '2017-05-08' '2018-03-20' 

Let me know if this is unclear.

+1
source share

You can complete the second part with a query

  select distinct wc.emp_id from (select emp_id, min(start_date) start_date from work_contracts group by emp_id) wc join (select emp_id, min(start_date) start_date from hourly_pay group by emp_id) hr on wc.emp_id = hr.emp_id where wc.start_date < hr.start_date 
+1
source share

I would use not exists / exists :

 select wa.empid from work_assignments wa where not exists (select 1 from hourly_pay hp where wa.emp_id = hp.emp_id); 

and for the second:

 select wa.* from work_assignments wa where not exists (select 1 from hourly_pay hp where wa.emp_id = hp.emp_id and ep.start_date <= wp.start_date ); 

The question is very important for (2). However, I would expect you to want hourly pay for the entire period of the assignment, and not just for the start date. If so, then the OP must ask a new question.

+1
source share

Make a date comparison in the internal query, then wrap it to filter it on those that meet the late payment criteria.

 select * from ( select distinct c.emp_id, case when c.start_date < hr.start_date then 1 else 0 end as latePay from work_contracts c left join hourly_pay hr USING(emp_id) ) result where latePay = 1 
+1
source share

You can solve this problem with daterange (because basically you want to specify the missing ranges in the hourly_pay table.).

I used the following operators in it:

  • + range union
  • - range subtraction
  • && range crossing check
  • @> test to limit range

With these and simple left join you can write a query to find out which ranges are missing in the hourly_pay table.

 select wa.emp_id, lower(dr) start_date, upper(dr) - 1 end_date from work_assignments wa left join hourly_pay hp on wa.emp_id = hp.emp_id and daterange(wa.start_date, wa.end_date, '[]') && daterange(hp.start_date, hp.end_date, '[]') cross join lateral (select case when hp is null then daterange(wa.start_date, wa.end_date, '[]') else daterange(wa.start_date, wa.end_date, '[]') + daterange(hp.start_date, hp.end_date, '[]') - daterange(hp.start_date, hp.end_date, '[]') end dr) dr where not exists (select 1 from hourly_pay p where p.emp_id = wa.emp_id and daterange(p.start_date, p.end_date, '[]') @> dr) -- emp_id | start_date | end_date ----------+------------+------------- -- 1 | 2017-05-01 | 2017-05-19 -- 2 | 2017-05-08 | (null) 

http://sqlfiddle.com/#!17/4bac0/14

0
source share

Maybe the wording suits me a little, but that would not be enough? This will return any emp_id where there is a record for which the hourly start date is after the start date of the work assignment.

 select distinct wc.emp_id from work_contracts wc left join hourly_pay hr USING(emp_id) where hr.start_date > wc.start_date 
0
source share

If I understand correctly that the request should work,

  • The first connection gets hourly_pays, which has more date than work.

  • the second connection checks until it finds the earliest hour from the hourly_pay table

  • the first left join can be avoided if you do not want to see employees who do not have data in the hourly_pay table [emp_id = 2]

     select h.emp_id,h.start_date from work_assignments hr left join hourly_pay h on hr.emp_id=h.emp_id and hr.start_date < h.start_date left join hourly_pay h2 on h2.emp_id = h.emp_id and h.start_date > h2.start_date where h2.start_date is null 
0
source share
 select distinct p.emp_id <br> from hourly_pay p <br> join work_assignments w on p.emp_id = w.emp_id <br> where p.start_date < w.start_date <br> 

In accordance with the stated requirement in the original question: find the entries where hourly_pay start_date , which are later than the work assignments start_date . Again, given the data here, the request should return emp_id 1 (because work_assignments.start_date has May-10-2017, and the earliest hourly_pay.start_date is May-20-2017)

This means that they only want the employee ID number.

0
source share

The second request is very simple,

Try to complete the request

 select distinct h.emp_id from work_assignments w inner join hourly_pay h on w.emp_id = h.emp_id and h.start_date > w.start_date; 
0
source share

Looking at your data, I can make the following assumptions:

1) For an employee having end_date as null , there may be a maximum of one record that applies to both tables.

2) The dates of several records for the same employee do not overlap . When an employee has several records (for example, Emp 1), he / she cannot have dates like [jan 1 - feb 1] and the next record like [jan 15-feb 20] or [jan 15 - null] ( they should be for non-overlapping periods).

Given this, the below query should work for you.

 SELECT hourly_pay.* FROM work_assignments INNER JOIN hourly_pay USING(emp_id) WHERE hourly_pay.start_date > work_assignments.start_date AND ( hourly_pay.start_date < work_assignments.end_date OR (work_assignments.end_date is null AND hourly_pay.end_date is null) ); 

Explanation: The query joins both tables in emp_id, then filters the records that

1) Run start_date in hourly_pay> start_date in work_assignments

th -

2) Run start_date in hourly_pay <end_date in work_assignments (This is necessary, so we can avoid comparing unrelated time period entries from both tables

-OR-

The end dates of both table entries are null, using assumption 1 (indicated above) there can be a maximum of one record for an employee whose end_date is null.

Based on your data, this query should return both EMP 1 records in hourly_pay as start_date, there is> start_date in work_assignments.

If you just need a list of EMP IDs, you can simply select this SELECT DISTINCT hourly_pay.emp_id ...(rest of the query) column SELECT DISTINCT hourly_pay.emp_id ...(rest of the query)

0
source share

http://sqlfiddle.com/#!17/f4595/1

  • There are no entries in the hourly_pay table;

Instead of using the left join and then filtering for null entries, I suggest you use not exists , it will work faster.

  SELECT w.emp_id, 'missing in the hourly_pay table' FROM work_assignments w WHERE NOT exists (SELECT 1 FROM hourly_pay h WHERE h.emp_id = w.emp_id) 
  1. Hourly_pay start_date entries later than the start_date work order;

     SELECT w.emp_id FROM work_assignments w WHERE NOT exist ( SELECT 1 FROM hourly_pay hp WHERE hp.start_date < w.start_date AND w.emp_id = hp.emp_id ) 

The second query actually includes the results of the first query, so you can combine them as shown below:

 SELECT w.emp_id, (CASE WHEN ( EXISTS (SELECT 1 FROM hourly_pay h WHERE h.emp_id = w.emp_id ) ) THEN 'hourly_pay start_date is later' ELSE 'missing in the hourly_pay table' END) FROM work_assignments w WHERE NOT EXISTS ( SELECT 1 FROM hourly_pay hp WHERE hp.start_date < w.start_date AND w.emp_id = hp.emp_id ) 
0
source share

It will make the work beautiful.

 SELECT DISTINCT emp_id FROM work_assingment JOIN hourly_pay hr USING(emp_id) WHERE hr.start_date < work_assingment.start_date; 
0
source share

All Articles