Do not waste time profiling. Time is always in database operations. Make as little as possible. Just the minimum number of inserts.
Three things.
One. Do not use SELECT again and again to match Date, Hostname, and Person parameters. Extract all ONCE data into a Python dictionary and use it in memory. Do not make repeated single selections. Use Python.
Two. Do not update.
In particular, do not do this. This is bad code for two reasons.
cursor.execute("UPDATE people SET chats_count = chats_count + 1 WHERE id = '%s'" % person_id)
It will be replaced by a simple SELECT COUNT (*) FROM .... Never update the counter. Just count the rows that are with the SELECT statement. [If you cannot do this with a simple SELECT COUNT or SELECT COUNT (DISTINCT), you are missing data - your data model should always contain the correct full values. Never update.]
A. Never create SQL using string replacement. Totally dumb.
If for some reason SELECT COUNT(*) not fast enough (check first before doing anything lame), you can cache the count result in another table. AFTER all loads. Do a SELECT COUNT(*) FROM whatever GROUP BY whatever and paste this into the counting table. Do not update. Someday.
Three. Use bind variables. Always.
cursor.execute( "INSERT INTO ... VALUES( %(x)s, %(y)s, %(z)s )", {'x':person_id, 'y':time_to_string(time), 'z':channel,} )
SQL never changes. Values associated with the change, but SQL never changes. It is much faster. Never create SQL queries dynamically. Never.
S. Lott
source share