SQL: eliminate selection of redundant records that occur within X minutes

The DB used is Firebird 2.1, in case you are not familiar with this, select the sql ref statement:
http://ibexpert.net/ibe/index.php?n=Doc.DataRetrieval
ref functions: http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd21.html

I will be happy with any sql slang [I will convert it].

table layout:

CREATE TABLE EVENT_MASTER ( EVENT_ID BIGINT NOT NULL, EVENT_TIME BIGINT NOT NULL, DATA_F1 VARCHAR(40), DATA_F2 VARCHAR(40), PRIMARY KEY (EVENT_ID) ); 

The bad news: EVENT_TIME is stored as seconds elapsed since Epoch.

sample data:

 "EVENT_ID","EVENT_TIME","DATA_F1","DATA_F2" 25327,1297824698,"8604","A" 25328,1297824770,"8604","I" 25329,1297824773,"8604","A" 25330,1297824793,"8604","A" 25331,1297824809,"8604","1" 25332,1297824811,"8604","GREY" 25333,1297824812,"8604","A" 25334,1297824825,"8604","GREY" 25335,1297824831,"8604","A" 25336,1297824833,"8604","GREY" 25337,1297824838,"8604","A" 25338,1297824840,"8604","1" 25339,1297824850,"8604","A" 25340,1297824864,"8604","A" 25341,1297824875,"8804","GREY" //notice DATA_F1 is different 25342,1297824876,"8604","G" 25343,1297824877,"8604","A" 25344,1297824880,"8604","GREY" 25345,1297824895,"8604","1" 25346,1297824899,"8604","A" 25347,1297824918,"8604","GREY" 25348,1297824930,"8604","YELLOW" 25349,1297824939,"8604","GREY" 25350,1297824940,"8604","" 25351,1297824944,"8604","A" 25352,1297824945,"8604","1" 25353,1297824954,"8604","B" 25354,1297824958,"8604","" 25355,1297824964,"8604","1" 25356,1297824966,"8604","GREY" 25357,1297824974,"8604","1" 25358,1297824981,"8604","GREY" 25359,1297824983,"8604","" 25360,1297824998,"8604","GREY" 25361,1297825003,"8604","2" 25362,1297825009,"8604","G" 25363,1297825018,"8604","GREY" 25364,1297825026,"8604","F" 25365,1297825045,"8604","GREY" 25366,1297825046,"8604","1" 

Expected Result:
excellent lines "DATA_F1", "DATA_F2" for X minutes according to EVENT_TIME: like:

 25341,1297824875,"8804","GREY" 25327,1297824698,"8604","A" 25328,1297824770,"8604","I" 25332,1297824811,"8604","GREY" 25348,1297824930,"8604","YELLOW" ..etc 

Requirements: exclude the selection of redundant records that occur within 5 minutes [calculation range in accordance with the EVENT_TIME column].

last, I'm trying to execute this template:

 SELECT * FROM EVENT_MASTER inner join ( SELECT distinct DATA_F1, DATA_F2 FROM EVENT_MASTER where /*the hard stuff that i need help with: (EVENT_TIME difference within X minutes)*/ ) as RemovedDup ON /*EVENT_MASTER.EVENT_ID = problem is i cant select RemovedDup ID otherwise distinct becomes useless!!*/ 

please help as soon as possible.

thanks,

EDIT

adding output based on Andrei K.'s answer:

 25331,1297824809,"8604","1" 25327,1297824698,"8604","A" 25342,1297824876,"8604","G" 25332,1297824811,"8604","GREY" 25328,1297824770,"8604","I" 25341,1297824875,"8804","GREY" 25350,1297824940,"8604","" 25352,1297824945,"8604","1" /*bug: time still within 300 seconds, this same as first record*/ 25361,1297825003,"8604","2" 25351,1297824944,"8604","A" 25353,1297824954,"8604","B" 25364,1297825026,"8604","F" 25362,1297825009,"8604","G" 25347,1297824918,"8604","GREY" 25372,1297825087,"8604","ORANGE" 25348,1297824930,"8604","YELLOW" 25382,1297825216,"8604","1" 25387,1297825270,"8604","B" 25394,1297825355,"8604","BLUE" 25381,1297825211,"8604","GREY" 

EDIT 2: Russell query output: nice output and very very slow.

 1297824698,"8604","A" 1297824770,"8604","I" 1297824809,"8604","1" 1297824811,"8604","GREY" 1297824825,"8604","GREY" 1297824840,"8604","1" 1297824875,"8804","GREY" 1297824876,"8604","G" 1297824880,"8604","GREY" 1297824918,"8604","GREY" 1297824930,"8604","YELLOW" 1297824939,"8604","GREY" 1297824940,"8604","" 1297824945,"8604","1" 1297824954,"8604","B" 1297824964,"8604","1" 1297824998,"8604","GREY" 1297825003,"8604","2" 1297825018,"8604","GREY" 1297825026,"8604","F" 1297825045,"8604","GREY" 1297825046,"8604","1" 1297825063,"8604","1" 1297825079,"8604","GREY" 1297825087,"8604","ORANGE" 1297825094,"8604","GREY" 1297825100,"8604","1" 1297825133,"8604","GREY" 1297825176,"8604","GREY" 1297825216,"8604","1" 

EDIT 3:

based on Russell's request: all rows WHERE DATA_F1 = '8604' AND DATA_F2 = 'GRAY'

 25332,1297824811,"8604","GREY" 25334,1297824825,"8604","GREY" 25336,1297824833,"8604","GREY" 25344,1297824880,"8604","GREY" 25347,1297824918,"8604","GREY" 25349,1297824939,"8604","GREY" 25356,1297824966,"8604","GREY" 25358,1297824981,"8604","GREY" 25360,1297824998,"8604","GREY" 25363,1297825018,"8604","GREY" 25365,1297825045,"8604","GREY" 25367,1297825059,"8604","GREY" 25371,1297825079,"8604","GREY" 25373,1297825094,"8604","GREY" 25376,1297825116,"8604","GREY" 25378,1297825133,"8604","GREY" 25380,1297825176,"8604","GREY" 25381,1297825211,"8604","GREY" 25384,1297825234,"8604","GREY" 25389,1297825286,"8604","GREY" 25390,1297825314,"8604","GREY" 25391,1297825323,"8604","GREY" 25393,1297825343,"8604","GREY" 25396,1297825370,"8604","GREY" 25397,1297825387,"8604","GREY" 25399,1297825416,"8604","GREY" 25401,1297825436,"8604","GREY" 25402,1297825445,"8604","GREY" 25404,1297825454,"8604","GREY" 50282,1299137344,"8604","GREY" 380151,1309849420,"8604","GREY" 

From this moment [oct. 11.2101 5:00 GMT] there is no absolute correct answer, and Andrei K. answers the still best attempt among others. therefore, sql experts, please help me find a solution, otherwise I will start to think that sql is not able to handle issues! It??

Note: event_time is not unique, so multiple events may occur in one second.

+4
source share
7 answers

You will need a very nasty recursive query in order to accomplish this in a purely “functional” way. I do not pretend to be smart enough to build such a request, not to mention making it workable.

On the other hand, resolving side effects (i.e. a temporary table) greatly simplifies the work. You can even do this fast enough by adding the appropriate indexes to the temporary table (not shown here). Here is the actual SQL:

 CREATE GLOBAL TEMPORARY TABLE EVENT_MASTER_TMP ( EVENT_ID BIGINT NOT NULL, EVENT_TIME BIGINT NOT NULL, DATA_F1 VARCHAR(40), DATA_F2 VARCHAR(40), PRIMARY KEY (EVENT_ID) ); INSERT INTO EVENT_MASTER_TMP SELECT * FROM (SELECT * FROM EVENT_MASTER ORDER BY EVENT_TIME) E WHERE NOT EXISTS ( SELECT * FROM EVENT_MASTER_TMP T WHERE E.DATA_F1 = T.DATA_F1 AND E.DATA_F2 = T.DATA_F2 AND E.EVENT_TIME - T.EVENT_TIME <= 5*60 ); SELECT * FROM EVENT_MASTER_TMP; 

In plain English:

  • Go through events from older to newer,
  • for each event, check if it is redundant with respect to some row that is already in the temporary table
  • and if not, insert it into the temporary table so that it can be used as a criterion for other events.

Performing this on test data yields:

 25327 1297824698 8604 A 25328 1297824770 8604 I 25331 1297824809 8604 1 25332 1297824811 8604 GREY 25341 1297824875 8804 GREY 25342 1297824876 8604 G 25348 1297824930 8604 YELLOW 25350 1297824940 8604 25353 1297824954 8604 B 25361 1297825003 8604 2 25364 1297825026 8604 F 

Lowering the time threshold from 5*60 to, say, 233 , we get the following:

 25327 1297824698 8604 A 25328 1297824770 8604 I 25331 1297824809 8604 1 25332 1297824811 8604 GREY 25341 1297824875 8804 GREY 25342 1297824876 8604 G 25348 1297824930 8604 YELLOW 25350 1297824940 8604 25351 1297824944 8604 A <-- 246s difference 25353 1297824954 8604 B 25361 1297825003 8604 2 25364 1297825026 8604 F 25365 1297825045 8604 GREY <-- 234s difference 25366 1297825046 8604 1 <-- 237s difference 
+1
source

If by redundant lines you mean lines registered within 5 minutes and having the same data_f1, data_f2, try something like this:

 SELECT e2.event_id, e2.event_time, e2.data_f1, e2.data_f2 FROM (SELECT trunc(event_time / 300), data_f1, data_f2, min(event_id) as e_id FROM event_master GROUP BY 1, 2, 3) e1 JOIN event_master e2 ON e1.e_id = e2.event_id 
+4
source

U can try this:

 SELECT * FROM EVENT_MASTER group by (DATAF1, DATAF2) where event_time >(SELECT TIME_TO_SEC(now())-300) 

Hope this helps you.

+2
source

I am not familiar with Firebird, but I am using documentation, so if this is correct, this should work.

 SELECT DISTINCT MIN(A.EVENT_TIME) as MINEVENT_TIME, B.DATA_F1, B.DATA_F2 FROM EVENT_MASTER as A JOIN EVENT_MASTER as B ON A.EVENT_TIME BETWEEN B.EVENT_TIME-299 AND B.EVENT_TIME AND B.DATA_F1 = A.DATA_F1 AND B.DATA_F2 = A.DATA_F2 GROUP BY B.DATA_F1, B.DATA_F2, B.EVENT_TIME 

This syntax is validated but not validated.

+2
source

This assumes that all records have different values ​​in event_time (or they will be mutually exclusive).

 SELECT * FROM event_master AS data WHERE NOT EXISTS ( SELECT * FROM event_master WHERE event_time > data.event_time - 300 AND event_time <= data.event_time ) 

If multiple events with the same value in event_time can occur, can we assume that an event with a higher event_id CANNOT occur before an event with a lower event_id ? If so, you can change it above as shown below.

 SELECT * FROM event_master AS data WHERE NOT EXISTS ( SELECT * FROM event_master WHERE event_time > data.event_time - 300 AND event_time <= data.event_time AND event_id < data.event_id ) 

In case of simultaneous execution of several events, the one with the lowest event_id is selected.


In terms of performance, make sure the data is indexed, where event_time is the first indexed field.

+2
source

As I understand it, you want to get different values ​​for DATA_F1 and DATA_F2, but only for the "window" 5 minutes; after that, the values ​​may repeat, right? (Sorry, if I missed the question, it was a long day ...) I don't know much about Firebird, but here's how you do it on MS SQL Server:

 SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM EVENT_MASTER AS a LEFT JOIN EVENT_MASTER AS b ON a.DATA_F1=b.DATA_F1 AND a.DATA_F2=b.DATA_F2 AND a.EVENT_TIME<b.EVENT_TIME AND b.EVENT_TIME-a.EVENT_TIME<=5*60 WHERE b.EVENT_ID IS NULL 

Also, while you are checking this out, also try the modified version below: Hope this helps!

 SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM EVENT_MASTER AS a LEFT JOIN EVENT_MASTER AS b ON a.DATA_F1=b.DATA_F1 AND a.DATA_F2=b.DATA_F2 AND a.EVENT_ID<b.EVENT_ID AND a.EVENT_TIME<=b.EVENT_TIME AND b.EVENT_TIME-a.EVENT_TIME<=5*60 WHERE b.EVENT_ID IS NULL 

ADDED: Well, it looks like we have the right results. Here's my suggestion to optimize this child (since I saw that Firebird supports the EXISTS keyword, I rewrote the query below):

 SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM EVENT_MASTER AS a WHERE NOT EXISTS (SELECT * FROM EVENT_MASTER AS b WHERE a.DATA_F1=b.DATA_F1 AND a.DATA_F2=b.DATA_F2 AND a.EVENT_ID<b.EVENT_ID AND a.EVENT_TIME<=b.EVENT_TIME AND b.EVENT_TIME-a.EVENT_TIME<=5*60) 

Also add the following index:

 CREATE INDEX IX_SPEED ON EVENT_MASTER (EVENT_ID DESC, EVENT_TIME ASC, DATA_F1 ASC, DATA_F2 ASC) 

Hope this helps!

+2
source

to try:

 SELECT T1.* FROM EVENT_MASTER T1 WHERE EXISTS ( SELECT * FROM EVENT_MASTER T2 WHERE T2.DATA_F1=T1.DATA_F1 AND T2.DATA_F2=T1.DATA_F2 AND (T2.EVENT_TIME-T1.EVENT_TIME)<300 ) 
+1
source

All Articles