Efficient way to bulk insert using get_or_create () in Django (SQL, Python, Django)

Is there a more efficient way to do this?

for item in item_list: e, new = Entry.objects.get_or_create( field1 = item.field1, field2 = item.field2, ) 
+17
python django bulkinsert
Feb 12 2018-10-12
source share
5 answers

You cannot make decent bulk inserts using get_or_create (or even create one), and there is no API to do this easily.

If your table is simple enough that creating raw SQL rows is not too painful, it is not too difficult; something like:

 INSERT INTO site_entry (field1, field2) ( SELECT i.field1, i.field2 FROM (VALUES %s) AS i(field1, field2) LEFT JOIN site_entry as existing ON (existing.field1 = i.field1 AND existing.field2 = i.field2) WHERE existing.id IS NULL ) 

where% s is a string similar to ("field1, field2"), ("field3, field4"), ("field5, field6") , which you will need to create and escape properly yourself.

+8
Feb 12 2018-10-12
source share

Depending on what you are aiming for. You can use the manage.py loaddata function to load data in the appropriate format (JSON, XML, YAML, ...).

See also this discussion .

+4
Feb 12 2018-10-12
source share

If you are not sure if the things in your item_list already exist in your database and you need model objects, then get_or_create is definitely the way to go.

If you know that the elements are NOT in your database, you would do much better:

 for item in item_list: new = Entry.objects.create( field1 = item.field1, field2 = item.field2, ) 

And if you don't need objects, just ignore the return from the function call. This will not speed up the DB database, but it will help in memory management if this is a problem.

If you are not sure that the data is already in the database, but there is a unique=True flag in this field, then the database will ensure uniqueness, and you can just catch the exception and move on. This will prevent the addition of an additional database, avoiding trying to select an existing object.

 from django.db import IntegrityError for item in item_list: try: new = Entry.objects.create( field1 = item.field1, field2 = item.field2, ) except IntegrityError: continue 

You can increase the speed anyway by manually managing transactions. Django will automatically create and complete a transaction for each save, but will provide some decorators that will significantly increase efficiency if you know that you will make many database backups in a specific function. In Django docs, it's better to explain all of this than here, but you probably want to pay special attention to django.db.transaction. commit_on_success

+1
Feb 12 2018-10-12
source share

Starting from 1.4 you can do bulk_create

See documents

* Note the caveats though (the most important thing is that the save () method will not be called, and therefore the pre_save and post_save signals will not be sent.) *

+1
Mar 09 '13 at 11:44
source share

I would say no.

But I am wondering what type is your item if they have field1 and field2 . There seems to be another class representing the record, but not derived from models.Model . Perhaps you can omit this class and instantiate Entry instances instead of creating these elements.

0
Feb 12 2018-10-12
source share



All Articles