What coding should I enter in python

I learned from websitethat that I have to add a code declaration in python when I don't enter friendly Unicode characters: http://www.python.org/dev/peps/pep-0263/ , but I still feel embarrassed about it .

Suppose I work on linux with vim, and I create a new py file and input codes as follows:

#!/usr/bin/python2.7 # -*- coding: utf8 -*- s = u'ޔ' print s 

1 . I tried replacing line 2 with codes as follows:

 import sys reload(sys) sys.setdefaultencoding('utf8') 

but it doesn’t work, right?

2 . I am not very familiar with linux, I really do not know why I should add _*_ at the beginning and at the end of delcaration code, and when I tried to replace # -*- coding: utf8 -*- with # code=utf8 or # code: utf8 , I got an error

 File "pythontest.py", line 3 SyntaxError: Non-ASCII character '\xde' in file pythontest.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details 

but this code declaration is mentioned at http://www.python.org/dev/peps/pep-0263/ !

and in accordance with the documentation, code declaration is allowed as follows:

 # This Python file uses the following encoding: utf-8 

Oh what is that? I do not think that this can be recognized by a computer. What was the code supposed to declare in the world? I feel more and more confused.

Thanks for the help.

+7
source share
4 answers

The PEP abstract you are referencing really says it all:

This PEP suggests introducing syntax for declaring the encoding of the Python source file . Then the encoding information is used by Python parser to interpret the file using this encoding. Most of all, in particular, it enhances the interpretation of Unicode literature in the source code and allows you to write literals in Unicode format using, for example, UTF-8 directly in an editor that supports Unicode.

(my emphasis).

Even if what you wanted to do would work (replacing the encoding of the source file programmatically), it would not make any sense. Think about it: the code is static (doesn't change). It would be pointless to try to read it with a different encoding: there is only one correct one (the one that the author of the source edited in the source).

Regarding the syntax:

 # This Python file uses the following encoding: utf-8 

PEP itself says this syntax is: “Without an interpreter line, using plain text.” It is placed there for people. So, if you open the file in a text editor and find it full of gibberish, you can manually set the source encoding in your menu.

EDIT:. Why should you put the encoding between # -*- and -*- ... This is purely arbitrary. The first character, a hash sign, says that it is a comment (therefore, it will not be compiled for bytecode), then _*_ is just a way to tell the parser that it is a specific comment for it.

This is no different from posting to your source:

 # TODO: fix this nasty bug 

in which the TODO: part tells the developer (and some IDE) that this is a message requiring action. You could use whatever you want, including @MarkZar or WTF! ... just convention!

NTN!

+3
source

An important part of the python coding declaration is coding: utf-8 , and it should be in the comment before the first line of python code, and you can do whatever you want with the other part of the comment.

Here are the lines in PEP that describe this behavior:

More precisely, the first or second line must match the regular expression "encoding [: =] \ s * ([- \ w.] +)". The first group of this expression is then interpreted as the encoding name. If the encoding is unknown to Python, an error occurs during compilation. There should not be a single Python statement in the string containing the encodings.

+2
source

You need a string, since you need to tell the compiler that uses the source encoding.

0
source

The encoding identifier is performed using the regular expression coding[:=]\s*([-\w.]+) Anywhere in the string. It means:

  • find the exact string coding= or coding: followed by zero or more space characters, followed by the run of at least one character, which is alphanumeric, _ or - .

  • capture at least one run ...

  • the captured part is used as an encoding.

That is, it’s completely legal to use something like

 # This program was written for Python 3. Encoding that should be used for decoding: UTF-8! 

because a string in the required format can be found there .


The default Python 3 source files for UTF-8 are as encoding, so Python 3 does not require # coding: utf-8 if you use UTF-8.

0
source

All Articles