Interpolate Missing Values ​​in MySQL Table

I have intraday stock data stored in a MySQL table that looks like this:

+----------+-------+ | tick | quote | +----------+-------+ | 08:00:10 | 5778 | | 08:00:11 | 5776 | | 08:00:12 | 5778 | | 08:00:13 | 5778 | | 08:00:14 | NULL | | 08:00:15 | NULL | | 08:00:16 | 5779 | | 08:00:17 | 5778 | | 08:00:18 | 5780 | | 08:00:19 | NULL | | 08:00:20 | 5781 | | 08:00:21 | 5779 | | 08:00:22 | 5779 | | 08:00:23 | 5779 | | 08:00:24 | 5778 | | 08:00:25 | 5779 | | 08:00:26 | 5777 | | 08:00:27 | NULL | | 08:00:28 | NULL | | 08:00:29 | 5776 | +----------+-------+ 

As you can see, there are some moments when data is not available (quote NULL ). What I would like to do is simply interpolate the steps. This means that each NULL value must be updated with the last available value. The only way I was able to do this is with cursors, which are rather slow due to the large amount of data. I'm basically looking for something like this:

 UPDATE table AS t1 SET quote = (SELECT quote FROM table AS t2 WHERE t2.tick < t1.tick AND t2.quote IS NOT NULL ORDER BY t2.tick DESC LIMIT 1) WHERE quote IS NULL 

Of course, this request will not work, but it should look like this.

I would appreciate any ideas on how this can be solved without cursors and temporary tables.

+4
source share
3 answers

This should work:

 SET @prev = NULL; UPDATE ticks SET quote= @prev := coalesce(quote, @prev) ORDER BY tick; 

By the way, the same trick is used for reading:

 SELECT t.tick, @prev := coalesce(t.quote, @prev) FROM ticks t JOIN (SELECT @prev:=NULL) as x -- initializes @prev ORDER BY tick 
+5
source

The main problem here is the link to the main request in the t2.tick < t1.tick . Because of this, you cannot just wrap a subquery in another subquery.

If this is a one-time request and there is not much data, you can do something like this:

 UPDATE `table` AS t1 SET quote = (SELECT quote FROM (SELECT quote, tick FROM `table` AS t2 WHERE t2.quote IS NOT NULL) as t3 WHERE t3.tick < t1.tick ORDER BY t3.tick DESC LIMIT 1) WHERE quote IS NULL 

But really, really, do not use this, as it is likely to slow down. For every zero quote, this query selects all the data from the table table , and then from the results it gets the desired row.

0
source

I would create a (temporary) table with the same layout as your table, and execute the following two queries:

Insert all interpolations into temp_stock table

 INSERT INTO temp_stock (tick, quote) SELECT s2.tick , (s1.quote + s3.quote) /2 as quote FROM stock INNER JOIN stock s1 ON (s1.tick < s2.tick) INNER JOIN stock s3 ON (s3.tick > s2.tick) WHERE s2.quote IS NULL GROUP BY s2.tick HAVING s1.tick = MAX(s1.tick), s3.tick = MIN(s3.tick) 

Update stock table with pace values

  UPDATE stock s INNER JOIN temp_stock ts ON (ts.tick = s.tick) SET s.quote = ts.quote 

It uses a temporary table (make sure it is a memory table for speed), but it does not need a cursor.

0
source

All Articles