You can try changing the search depending on whether “A” fits into the memory and sequentially scans “B”.
Otherwise, load the log files into the SQLite3 database with two tables (log_a, log_b) containing (timestamp, uniq_id, rest_of_line), then execute the SQL connection on uniq_id and perform any processing required for the results from that. This will reduce memory overhead, allow the SQL server to make a connection, but, of course, requires efficient duplication of log files on disk (but this is usually not a problem for most systems).
Example
import sqlite3 from datetime import datetime db = sqlite3.connect(':memory:') db.execute('create table log_a (timestamp, uniq_id, filesize)') a = ['[2012-09-12 12:23:33] SOME_UNIQ_ID filesize'] for line in a: timestamp, uniq_id, filesize = line.rsplit(' ', 2) db.execute('insert into log_a values(?, ?, ?)', (timestamp, uniq_id, filesize)) db.commit() db.execute('create table log_b (timestamp, uniq_id)') b = ['[2012-09-12 13:23:33] SOME_UNIQ_ID'] for line in b: timestamp, uniq_id = line.rsplit(' ', 1) db.execute('insert into log_b values(?, ?)', (timestamp, uniq_id)) db.commit() TIME_FORMAT = '[%Y-%m-%d %H:%M:%S]' for matches in db.execute('select * from log_a join log_b using (uniq_id)'): log_a_ts = datetime.strptime(matches[0], TIME_FORMAT) log_b_ts = datetime.strptime(matches[3], TIME_FORMAT) print matches[1], 'has a difference of', abs(log_a_ts - log_b_ts) # 'SOME_UNIQ_ID has a difference of 1:00:00' # '1:00:00' == datetime.timedelta(0, 3600)
Note that:
.connect on sqlite3 should be a file namea and b should be your files
source share