Python 2 assumes various source code encodings

I noticed that without declaring the encoding of the source code, the Python 2 interpreter assumes that the source code is encoded in ASCII with scripts and standard input:

$ python test.py # where test.py holds the line: print u'é' File "test.py", line 1 SyntaxError: Non-ASCII character '\xc3' in file test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details $ echo "print u'é'" | python File "/dev/fd/63", line 1 SyntaxError: Non-ASCII character '\xc3' in file /dev/fd/63 on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details 

and encoded in ISO-8859-1 with the -m and -c module command flags:

 $ python -m test # where test.py holds the line: print u'é' é $ python -c "print u'é'" é 

Where is this documented?

Contrast this with Python 3, which always assumes that the source code is UTF-8 encoded and thus prints é on four occasions.

Note. - I tested this on CPython 2.7.14 on both macOS 10.13 and Ubuntu Linux 17.10 with console encoding installed in UTF-8.

+1
source share
1 answer

The -c and -m switches ultimately (*) run the code that comes with the exec or compile() function operator, both of which take the source code for Latin-1:

The first expression must be evaluated as a Unicode string, a Latin-1 encoded string, an open file, a code object, and a tuple.

This is not documented; it is an implementation detail that may or may not be considered an error.

I don't think this is something worth fixing, but Latin-1 is a superset of ASCII, so little is lost. How code from -c and -m processed has been cleaned up in Python 3 and is much more consistent there; code transmitted using -c is decoded using the current locale, and modules loaded with the -m switch by default equal to UTF-8, as usual.


(*) If you want to know the exact applications used, start with the Py_Main() function in Modules/main.c , which treats both -c and -m as:

 if (command) { sts = PyRun_SimpleStringFlags(command, &cf) != 0; free(command); } else if (module) { sts = RunModule(module, 1); free(module); } 
+2
source

All Articles