I parse the log and paste it into MySQL or SQLite using SQLAlchemy and Python. Right now I am opening a database connection, and when I iterate over each row, I insert it after it is parsed (now this is only one large table, not very experienced with SQL). Then I close the connection when the loop is complete. Generalized code:
log_table = schema.Table('log_table', metadata,
schema.Column('id', types.Integer, primary_key=True),
schema.Column('time', types.DateTime),
schema.Column('ip', types.String(length=15))
....
engine = create_engine(...)
metadata.bind = engine
connection = engine.connect()
....
for line in file_to_parse:
m = line_regex.match(line)
if m:
fields = m.groupdict()
pythonified = pythoninfy_log(fields) #Turn them into ints, datatimes, etc
if use_sql:
ins = log_table.insert(values=pythonified)
connection.execute(ins)
parsed += 1
My two questions are:
- Is there a way to speed up insertion in this basic structure? Maybe there is a queue of inserts and some inserts, some voluminous inserts, etc.
- When I used MySQL, for about 1.2 million records, the insertion time was 15 minutes. With SQLite, insertion time was just over an hour. Does the time difference between db engines seem correct, or does it mean that I am doing something very wrong?