Python open () unicode file behavior varies across OS

The file name looks like:

filename = u"/direc/tories/español.jpg"

And using open () as:

fp = open(filename, "rb")

This will open the file correctly in OSX (10.7), but in Ubuntu 11.04, the open () function will try to open u"espa\xf1ol.jpg", and this will lead to an error with IOError.

In the process of trying to fix this, I checked sys.getfilesystemencoding()on both systems, both are installed on utf-8 (although Ubuntu reports uppercase letters, that is, UTF-8, I'm not sure if this is relevant). I also installed # -*- coding: utf-8 -*-python in the file, but I am sure that this only affects the encoding inside the file itself and not any external functions or how python interacts with system resources. The file exists on both systems with the correct eñe display.

The final question: how to open a file español.jpgon an Ubuntu system?

Edit: A string español.jpgactually exits the database through Django ORM (ImageFileField), but by the time I do it and see the difference in behavior, I have a single Unicode string that is the absolute path to the file.

+5
source share
2 answers

This one below should work in both cases:

fp = open(filename.encode(sys.getfilesystemencoding()), "rb")
+2
source

It is not enough to just set the file encoding at the top of the file. Make sure your editor uses the same encoding and saves the text in that encoding. If necessary, retype any character other than ascii to make sure your editor does the right thing.

, , , , -.

+1

All Articles