My preference is to write massive data to the database every 1000 rows, when I have to do it the way you describe. It seems like a good amount. Not too much processed if I had a crash and you need to generate some data (re-scraping in your case). But this is a good healthy bite that can reduce overhead.
As @derobert points out, moving a bunch of insertions into a transaction also helps reduce overhead. But do not put everything in one transaction - some DBMS vendors such as Oracle keep a βre-logβ during the transaction, so if you do too much work, it can cause overload. Dividing the work into large, but not too large, pieces is better. That is 1000 lines.
Some SQL implementations support multi-line INSERT (this is also mentioned by @derobert), but some do not.
You are right that cleaning up the raw data to a flat file and then downloading it later is probably worth it. Each SQL provider supports this type of bulk load in different ways, such as LOAD DATA INFILE in MySQL or " .import " in SQLite, etc. You will need to tell us which brand of SQL database you are using in order to get a more specific one, but in my experience such a mechanism could be 10-20x INSERT performance even after such improvements as using transactions and multi-line insertion.
Repeat your comment, you can take a look at BULK INSERT in Microsoft SQL Server. I usually donβt use Microsoft, so I donβt have first-class experience, but I find this a useful tool in your scenario.
source share