Export CSV to Stream (from Django admin by Heroku)

we need to export a csv file containing data from a model from admin Django that runs on Heroku. Therefore, we created an action in which we created csv and returned it in response. This worked fine until our client started exporting huge amounts of data and we ran into a 30 second timeout of a web worker.

To get around this problem, we thought about streaming csv to the client instead of first creating it in memory and sending it in one piece. The trigger was this information:

Cedar supports long polls and streaming responses. Your application has an initial 30 second window to respond with one byte back to the client. After each byte is sent (either received from> the client, or sent by your application), you reset the rolling 55 second window. If no data is sent within 55 seconds, your connection will be terminated.

So we checked something like this to check it out:

import cStringIO as StringIO import csv, time def csv(request): csvfile = StringIO.StringIO() csvwriter = csv.writer(csvfile) def read_and_flush(): csvfile.seek(0) data = csvfile.read() csvfile.seek(0) csvfile.truncate() return data def data(): for i in xrange(100000): csvwriter.writerow([i,"a","b","c"]) time.sleep(1) data = read_and_flush() yield data response = HttpResponse(data(), mimetype="text/csv") response["Content-Disposition"] = "attachment; filename=test.csv" return response 

The HTTP download header looks like this (from FireBug):

 HTTP/1.1 200 OK Cache-Control: max-age=0 Content-Disposition: attachment; filename=jobentity-job2.csv Content-Type: text/csv Date: Tue, 27 Nov 2012 13:56:42 GMT Expires: Tue, 27 Nov 2012 13:56:41 GMT Last-Modified: Tue, 27 Nov 2012 13:56:41 GMT Server: gunicorn/0.14.6 Vary: Cookie Transfer-Encoding: chunked Connection: keep-alive 

"Transfer-encoding: chunked" means that Cedar actually transfers the streams that we assume.

The problem is that the csv download is still interrupted after 30 seconds by these lines in the Heroku log:

 2012-11-27T13:00:24+00:00 app[web.1]: DEBUG: exporting tasks in csv-stream for job id: 56, 2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5) 2012-11-27T13:00:54+00:00 heroku[router]: at=info method=POST path=/admin/jobentity/ host=myapp.herokuapp.com fwd= dyno=web.1 queue=0 wait=0ms connect=2ms service=29480ms status=200 bytes=51092 2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5) 2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [12] [INFO] Booting worker with pid: 12 

This should work conceptually, right? Is there something we missed?

We greatly appreciate your help. Tom

+8
python django django-admin heroku cedar
source share
2 answers

I found a solution to the problem. This is not a Heroku timeout, because otherwise the Heroku magazine will have a H12 timeout (thanks to Caio from Heroku for pointing this out).

The problem was the default timeout for Gunicorn, which is 30 seconds. After adding - timeout 600 to Procfile (on the Gunicorn line) the problem disappeared.

Now the Procfile file looks like this:

 web: gunicorn myapp.wsgi -b 0.0.0.0:$PORT --timeout 600 celeryd: python manage.py celeryd -E -B --loglevel=INFO 
+6
source share

This is more likely not the problem of your script, but the problem of the 30 second Heroku timeout web request by default. You could read the following: https://devcenter.heroku.com/articles/request-timeout and according to this document - move the CSV export to the background process.

0
source share

All Articles