Compare different orders of one table

Question

Compare different orders of one table

I have the following scenario: a table with these columns:

table_id|user_id|os_number|inclusion_date

On the system, os_number is consistent for users, but due to a system error, some users inserted the OS in the wrong order. Something like that:

 table_id | user_id | os_number | inclusion_date ----------------------------------------------- 1 | 1 | 1 | 2015-11-01 2 | 1 | 2 | 2015-11-02 3 | 1 | 3 | 2015-11-01

Note the os 3 number inserted before the os 2 number

What I need:

Restore table_id of rows 2 and 3, which is out of order.

I have two choices that show me table_id in two different orders:

 select table_id from table order by user_id, os_number select table_id from table order by user_id, inclusion_date

I can’t understand how I can compare these two choices and see which users are affected by this system error.

+7

sql postgresql

Wellington zanelli Nov 30 '15 at 12:03

source share

6 answers

Gordon linoff · Answer 1 · 2015-11-30T14:44:13+0000

Your question is a little difficult because there is no proper order (as presented) - because dates can have relationships. So, use the rank() or dense_rank() function to compare two values and return those that are not in the correct order:

 select t.* from (select t.*, rank() over (partition by user_id order by inclusion_date) as seqnum_d, rank() over (partition by user_id order by os_number) as seqnum_o from t ) t where seqnum_d <> seqnum_o;

klin · Answer 2 · 2015-11-30T13:22:35+0000

Use row_number() for both orders:

 select * from ( select *, row_number() over (order by os_number) rnn, row_number() over (order by inclusion_date) rnd from a_table ) s where rnn <> rnd; table_id | user_id | os_number | inclusion_date | rnn | rnd ----------+---------+-----------+----------------+-----+----- 3 | 1 | 3 | 2015-11-01 | 3 | 2 2 | 1 | 2 | 2015-11-02 | 2 | 3 (2 rows)

momar · Answer 3 · 2015-11-30T14:23:32+0000

Not quite sure about the performance, but you can use the cross in the same table to get the results in a single query. This will cause invalid table_ids pairs.

 select a.table_id as InsertedAfterTableId, c.table_id as InsertedBeforeTableId from table a cross apply ( select b.table_id from table b where b.inclusion_date < a.inclusion_date and b.os_number > a.os_number ) c

Gabriel's messanger · Answer 4 · 2015-11-30T12:37:56+0000

I would use WINDOW FUNCTIONS to get line numbers in given orders and then compare them:

 SELECT sub.table_id, sub.user_id, sub.os_number, sub.inclusion_date, number_order_1, number_order_2 FROM ( SELECT table_id, user_id, os_number, inclusion_date, row_number() OVER (PARTITION BY user_id ORDER BY os_number ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS number_order_1, row_number() OVER (PARTITION BY user_id ORDER BY inclusion_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS number_order_2 FROM table ) sub WHERE number_order_1 <> number_order_1 ;

EDIT:

Due to a_horse_with_no_name there was a good point of view on my final answer. I will return to my first answer (see change history), which work if os_number not indifferent.

Benjamin bau · Answer 5 · 2015-11-30T13:20:15+0000

Both of the following request examples simply check for a mismatch between the inclusion date and the os_number number:

This first request should return an intruder string (the one whose os_number is off from the date of inclusion) - in the case of the line of example 3.

 select table.table_id, table.user_id, table.os_number from table where EXISTS(select * from table t where t.user_id = table.user_id and t.inclusion_date > table.inclusion_date and t.os_number < table.os_number);

This second query will return the table and user numbers for two inappropriate rows:

  select first_table.table_id, second_table.table_id, first_table.user_id from table first_table JOIN table second_table ON (first_table.user_id = second_table.user_id and first_table.inclusion_date > second_table.inclusion_date and first_table.os_number < second_table.os_number);

Julien blanchard · Answer 6 · 2015-11-30T15:50:22+0000

 select * from ( select a_table.*, lag(inclusion_date) over (partition by user_id order by os_number) as last_date from a_table ) result where last_date is not null AND last_date>inclusion_date;

This should cover gaps as well as communication. Basically, I just check the include_date of the last os_number and make sure that it is not strictly greater than the current date (so 2 versions for the same date are fine).

Compare different orders of one table

More articles: