Django 1.2 Cascading PostgreSQL for keys with enabled DELETE NO ACTION

I have a postgresql database with approximately 150 tables (this is a Django 1.2 project). Django adds ON DELETE NO ACTION and ON UPDATE NO ACTION to foreign keys during table creation.

Now I need to mass delete data (about 800,000 records) from a set of tables based on certain conditions.

Using Model.objects.filter().delete() not a parameter since the data is huge and it takes a lot of time.

Only the sandal options seem to be cascading deletion, but since Django adds "ON DELETE NO ACTION", it seems like the no option.

So my question is: is there a way to change all the keys for ON to DELETE CASCADE in a simple way (there are many) or something like that.

(I know that I can manually write SQL queries for each table, but that would be a monumental and difficult task.)

+4
source share
3 answers

As indicated in the link that contains Andrew’s answer, if you set this parameter to CASCADE in Django, then Django will go and delete “retail”. If it is set to NO ACTION , you can create a foreign key definition at the database level to handle things. It sounds like a smart plan to me.

Make sure you have a pointer defined in the referenced column set for each foreign key; otherwise, you will see very slow performance. Some database products will automatically create such an index when you determine the foreign key, but there are situations where it is not profitable, so PostgreSQL puts the matter in your hands to optimize as you see fit. (As one example, it might not be worth it to maintain the index during normal operations, but it would be worth building it before cleaning and dumping it after.)

+1
source

One note: ON DELETE CASCADE performs poor operations with bulk operations. The reason is that this is done as a trigger. Therefore, how it looks from the algorithmic point of view:

 for row in delete_set: for dependent row in (scan for referencing rows): delete dependent row 

If you delete 800,000 rows in the parent table, this translates to 800,000 individual delete checks on the dependent tables. Even in the best case, using 800,000 indexes, individual index scans will be much slower than a single sequential scan.

The best way to do this is to use a standard table expression to write to 9.1 or later, or just make separate delete instructions in the same transaction. Sort of:

 WITH rows_to_delete (id) AS ( SELECT id FROM mytable WHERE where_condition ), deleted_rows (id) AS ( DELETE FROM referencing_table WHERE mytable_id IN (select id FROM rows_to_delete) RETURNING mytable_id ), DELETE FROM mytable WHERE id IN (select id FROM deleted_rows); 

It comes down to something like an algorithm:

scan rows for deletion as delete_set for dependents in the scan for rows dependent on deletion: delete dependent for_delete in the search for rows referenced by deleted dependents: delete to_delete

Eliminating the forced scan of a nested loop will significantly speed up the process.

0
source

All Articles