How does cgi.FieldStorage store files?

Question

How does cgi.FieldStorage store files?

So, I played with raw WSGI, cgi.FieldStorage and file downloads. And I just don’t understand how this relates to downloading files.

At first it seemed like it was just storing the whole file in memory. And I thought that hm, this should be easy to check - a large file should clog up memory! .. And this is not so. However, when I request a file, it is a string, not an iterator, a file object, or something else.

I tried reading the source of the cgi module and found some things about temporary files, but it returns a freaking string, not a file (-like) object! So ... how does fscking work ?!

Here is the code I used:

import cgi from wsgiref.simple_server import make_server def app(environ,start_response): start_response('200 OK',[('Content-Type','text/html')]) output = """ <form action="" method="post" enctype="multipart/form-data"> <input type="file" name="failas" /> <input type="submit" value="Varom" /> </form> """ fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ) f = fs.getfirst('failas') print type(f) return output if __name__ == '__main__' : httpd = make_server('',8000,app) print 'Serving' httpd.serve_forever()

Thanks in advance!:)

+8

python wsgi cgi

Justinas Jul 27 '11 at 14:45

source share

3 answers

The best way is NOT to read the file (or even each line at a time, as suggested by gimel).

You can use some inheritance and extend the class from FieldStorage, and then override the make_file function. make_file is called when FieldStorage has a file type.

For your reference, default make_file looks like this:

 def make_file(self, binary=None): """Overridable: return a readable & writable file. The file will be used as follows: - data is written to it - seek(0) - data is read from it The 'binary' argument is unused -- the file is always opened in binary mode. This version opens a temporary file for reading and writing, and immediately deletes (unlinks) it. The trick (on Unix!) is that the file can still be used, but it can't be opened by another process, and it will automatically be deleted when it is closed or when the current process terminates. If you want a more permanent file, you derive a class which overrides this method. If you want a visible temporary file that is nevertheless automatically deleted when the script terminates, try defining a __del__ method in a derived class which unlinks the temporary files you have created. """ import tempfile return tempfile.TemporaryFile("w+b")

and then create a temporary file, permanently create the file wherever you want.

+5

hasanatkazmi Nov 29 '11 at 16:41

source share

Using @hasanatkazmi's answer (used in the Twisted app), I got something like:

 #!/usr/bin/env python2 # -*- coding: utf-8 -*- # -*- indent: 4 spc -*- import sys import cgi import tempfile class PredictableStorage(cgi.FieldStorage): def __init__(self, *args, **kwargs): self.path = kwargs.pop('path', None) cgi.FieldStorage.__init__(self, *args, **kwargs) def make_file(self, binary=None): if not self.path: file = tempfile.NamedTemporaryFile("w+b", delete=False) self.path = file.name return file return open(self.path, 'w+b')

It should be warned that the file is not always created by the cgi module. According to these cgi.py lines cgi.py it will be created only if the content exceeds 1000 bytes:

 if self.__file.tell() + len(line) > 1000: self.file = self.make_file('')

So, you should check if the file was really created with the request in the field of the user class path , for example:

 if file_field.path: # Using an already created file... else: # Creating a temporary named file to store the content. import tempfile with tempfile.NamedTemporaryFile("w+b", delete=False) as f: f.write(file_field.value) # You can save the 'f.name' field for later usage.

If the Content-Length parameter is also set for a field that seems rare, the file must also be created using cgi .

What is it. This way you can save the file predictably by reducing the memory usage of your application.

+2

Vladius Jul 11 '15 at 11:25

source share

gimel · Accepted Answer · 2011-07-27T16:49:19+0000

Checking the description of the cgi module , there is a paragraph that discusses how to handle file downloads.

If the field is a loaded file, access to the value via the value attribute or the getvalue() method reads the entire file in memory as a string . Perhaps this is not what you want. You can verify the downloaded file by testing the file name attribute or file attribute. Then you can read the data at your leisure from the file attribute:

 fileitem = form["userfile"] if fileitem.file: # It an uploaded file; count lines linecount = 0 while 1: line = fileitem.file.readline() if not line: break linecount = linecount + 1

As for your example, getfirst() is just a version of getvalue() . try replacing

 f = fs.getfirst('failas')

from

 f = fs['failas'].file

This will return a file-like object that is read "at your leisure."

How does cgi.FieldStorage store files?

More articles: