Python Sqlite3 module is much slower than SELECT than in shell

I use the sqlite3 module in Python, but find it incredibly slow for a specific SELECT query regarding running a query in sqlite3 in a command shell. I will start by saying that both versions are the same 3.7.17.

My request

SELECT r.ID, r.Date FROM my_table r WHERE r.Date IN (SELECT Date FROM my_table WHERE ID = r.ID GROUP BY Date LIMIT 2); 

Python code

 con = lite.connect(path_to_database) cur = con.cursor() with con: cur.execute(sql_query) 

where sql_query is a string variable containing the original query.

I guess the problem is optimizing the IN subquery.

Performance details: my_table contains 167,000 entries, a query in the shell takes ~ 10 seconds, a query in Python takes> 5 minutes (I stopped it when it went this far).

Currently, since this is creating a table, I am just copying and pasting the code into the shell as a workaround, how can I fix this so that I can run the query from Python?

Addition

When I run EXPLAIN QUERY PLAN , I get the following

Shell:

 0 0 0 SCAN TABLE PIT_10_Days AS r (~500000 rows) 0 0 0 EXECUTE CORRELATED LIST SUBQUERY 1 1 0 0 SEARCH TABLE PIT_10_Days USING AUTOMATIC C 1 0 0 USE TEMP B-TREE FOR GROUP BY 

Python:

 0 0 TABLE PIT_10_Days AS r 0 0 TABLE PIT_10_Days 

I'm not sure if the difference is a problem when getting an EXPLAIN QUERY PLAN in Python or if this is actually the problem itself.

+6
source share
1 answer

I'm sorry that I am so late, but I just found this question. Unfortunately, I have no idea why the sqlite3 module behaves differently than the shell, but you can try to avoid a correlated query from the first place. I'm not sure if he always does what you want, because you do not order the results in your subquery.

I suppose you need the last two dates for each ID? Try the following:

 SELECT r.ID AS ID, max( r.Date ) AS Date FROM my_table AS r GROUP BY r.ID UNION SELECT r.ID, max( r.Date ) FROM my_table AS r JOIN ( SELECT ID, max( Date ) AS Date FROM my_table GROUP BY ID) AS maxDat ON r.ID = maxDat.ID AND r.Date != maxDat.Date GROUP BY r.ID; 

He selects the identifiers along with his last date. Then it combines this result with a similar selection from the table in which the actual last date is displayed so that you get the second last date. If you need more than the last two dates, this will be rather cumbersome, but for two dates it should be good and probably a lot faster.

+1
source

All Articles