Insert a million small records into Extensible Storage Engine (JetBlue) - fast

I hope Laurion Berhall read this :-)

I need to insert a million tiny records as quickly as possible.

Now I am in a very narrow cycle, where for each record I

a) start a transaction (JetBeginTransaction) b) prepare an update (JetPrepareUpdate) c) add the row (JetSetColumns) d) commit the transaction (JetCommitTransaction) 

Now, during this process, I am in a tight loop on one processor. The target machine has several processors, excellent drives, and plenty of free RAM.

I am wondering how to improve performance.

As for transactions, I did some experimentation and had problems when errors returned if I put too much data in one transaction. I would like to better understand what is happening there - do I have an error or the size of the transaction is crowned, if it is limited, can I increase the cap? Am I just studying this because I assume that a transaction gives ESE the ability to do more caching in RAM while minimizing disk flushes? β€œIs that just a hunch?”

In general, how can I use multiple processors / lots of RAM / and nice drives? Do I open the database twice and from there? I'm not quite sure what is going on with regard to thread and transaction security. If I have two handles to the database, each in a transaction, will the record on one descriptor be available for the second immediately, before committing, or do I need to commit first?

any tips appreciated

  here are the constraints a) I've got a million records that need to be written into the DB as fast as possible b) to fully generate the record for insertion there are two searches that need to occur within the same table (seeking keys) c) This is a rebuild/regeneration of the DB - it either worked, or it didnt. If it didnt there is no going back, a fresh rebuild/regeneration is needed. I cannot restart mid process and without all the data none of the data is valuable. READ: having one big transaction is fine if it improves perf. I'd like ESE to cache, in ram, if that helps perf. 

thanks!

+4
source share
2 answers

For single-threaded performance, the most important task is your transaction model.

If you tried to put more data in one transaction, and it failed, you probably got JET_errOutOfVersionStore. Esent must track the cancellation of information for all operations performed in a transaction (to enable rollback), and this information is stored in the version store. The default version store size is pretty small. You can increase it using the system parameter JET_paramMaxVerPages. A value of 1024 (64 MB version storage) will allow for fairly large transactions. I suggest doing 100-1000 inserts per transaction.

When you call JetCommitTransaction, Esent will clear the disk log, generating synchronous I / O. To avoid this, pass JET_bitCommitLazyFlush to JetCommitTransaction. Your transactions will still be atomic, but not durable in the event of a failure (everything will be fine if you exit normally). It looks like it needs to be found for your use.

If you insert records in ascending order, you can leave with a single-threaded application. If you can change your implementation to make sequential inserts, you should - they are much faster. For random inserts, multiple threads may be useful. To use multiple threads, you just need to create new sessions (JetBeginSession) and open a database (JetOpenDatabase) for them. Esent uses the framework ( http://en.wikipedia.org/wiki/Snapshot_isolation ), so it cannot see changes made by other sessions that were not committed or committed after the start of the transaction. This is different from reading if you see changes after you make another session. You may need to think about how to split the work to deal with this.

+5
source

Make sure you insert "OK." What is a cluster (primary) key? Is this an artificial auto market? If so, you put in order. How many secondary indexes do you have? Ideally, you would not have one so that all inserts, for example. only cluster insert will be in index order. When you insert an index order, JetUpdate simply joins the end of the index. This is much faster than inserting in the middle of the index. Hope this helps, Yang.

+1
source

All Articles