Python process continues to grow in django db upload script

I am running a script conversion that captures large amounts of data in db using Django ORM. I use manual commit to speed up the process. I have hundreds of files to commit, each file will create more than a million objects.

I am using Windows 7 64bit. I noticed that the Python process continues to grow until it consumes more than 800 MB, and this is only for the first file!

The script iterates over the entries in the text file, reuses the same variables, and does not accumulate lists or tuples.

I read here that this is a common problem for Python (and possibly for any program), but I was hoping that maybe Django or Python has an explicit way to reduce the size of the process ...

Here is a code overview:

import sys,os sys.path.append(r'D:\MyProject') os.environ['DJANGO_SETTINGS_MODULE']='my_project.settings' from django.core.management import setup_environ from convert_to_db import settings from convert_to_db.convert.models import Model1, Model2, Model3 setup_environ(settings) from django.db import transaction @transaction.commit_manually def process_file(filename): data_file = open(filename,'r') model1, created = Model1.objects.get_or_create([some condition]) if created: option.save() while 1: line = data_file.readline() if line == '': break if not(input_row_i%5000): transaction.commit() line = line[:-1] # remove \n elements = line.split(',') d0 = elements[0] d1 = elements[1] d2 = elements[2] model2, created = Model2.objects.get_or_create([some condition]) if created: option.save() model3 = Model3(d0=d0, d1=d1, d2=d2) model3 .save() data_file.close() transaction.commit() # Some code that calls process_file() per file 
+1
python memory-management windows django
Nov 27 '10 at 17:34
source share
2 answers

First, make sure DEBUG=False in the settings.py file. All requests sent to the database are stored in django.db.connection.queries when DEBUG=True . This will lead to a large amount of memory if you import a lot of records. You can check this through the shell:

 $ ./manage.py shell > from django.conf import settings > settings.DEBUG True > settings.DEBUG=False > # django.db.connection.queries will now remain empty / [] 

If this does not help, try creating a new Process to run process_file for each file. This is not the most efficient, but you are trying to save memory usage, not processor cycles. Something like this should start you up:

 from multiprocessing import Process for filename in files_to_process: p = Process(target=process_file, args=(filename,)) p.start() p.join() 
+3
Nov 27 '10 at 18:39
source share

It is difficult to say what I would suggest - this is the profile of your code and see what part of your code causes this memory burst.

After you know how much code contains memory, you can think about reducing it.

Even after your efforts, memory consumption is not reduced, you can do it. Since processes get memory allocation in chunks (or pages) and freeing them while the process is still running, it is difficult to create a child process, do all your memory intensive tasks, and return the results back to the parent process and die. Thus, the consumed memory (of the child process) is returned back to the OS, and your parent process remains scarce ...

0
Nov 27 '10 at 18:22
source share



All Articles