Some utf8 characters are allowed in python source, some of them

I noticed that I cannot use all unicode characters in my python source code.

While

def(何): 

totally believable (though pointless [maybe?])

 def N(N₀, t, λ) -> 'N(t)': 

this is not valid (subscript is zero).

I also cannot use some other characters, most of which I recognize as something other than letters (for example, mathematical operators). I always thought that if I just stick to the rules that I know, i.e. Compiling names from letters and numbers, with a letter as the first character, everything will be fine. Now a zero index is a "number". so my impression was wrong.

I know that I should avoid using special characters. However, the definition of the function above (exponential decay, which is) seems completely reasonable to me - because it will never change, and it so elegantly conveys all the information needed for use by another programmer.

So my question is, which characters are allowed and which are not? And where?

Edit
Well, it seems I was not clear enough. I am using python3, so there is no need to declare the encoding of the source file. Apparently, I thought that my definition of function in China works.

My question is why some characters are allowed there, while others are not. Zero indexes cause an error , an invalid character in the identifier , but the draft bold version works. Both are equally special that I would say.

I would like to know if there are any general rules that apply not only to my situation, should be. It seems that my mistake is not accidental.

Edit 2:

The answer was kindly provided by Beau Martínez, pointing to a link to the language where I should have looked first:

http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html It seems that all valid characters are selected.

+7
unicode
source share
3 answers

According to the language reference , Python 3 allows the use of a large number of characters as identifiers.

This null index character looks like a number, but it is not for Python; Python only processes the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. This is actually a character, so you can use it as an identifier (as if it were, for example, a Greek character, such like phi).

Important, how easily can you enter these characters using the keyboard? For example, I do not want to pull up the character map every time I have to call your functions. Calling "maximum_decay_rate" or something more intuitive for any user, not just physics, makes your code more readable.

If you say that it is forbidden, perhaps because you did not specify a character encoding for your source file. It can be specified using # -*- coding: utf-8 -*- (or ever encoding) at the beginning of the source file.

+4
source share

Tell Python what the correct encoding is:

https://www.python.org/dev/peps/pep-0263/

Or...

 # -*- coding: utf-8 -*- 

or

 # coding=utf-8 

As for the characters that are actually allowed in variable names, the limitation is usually alphabetic characters, numbers, and underscores.

The "subscript" is not really a number, it is an index.

+3
source share

Each Unicode character has special “properties” that can be found in the Unicode character database, and for our purpose, properties from the so-called general category are important. They allow you to break all the characters into large groups:

  • letters ( L )
  • numbers ( N )
  • tags ( M )
  • punctuation ( P )
  • characters ( S )
  • dividers ( Z )
  • other ( C )

Groups have subgroups, for example Lu - Uppercase_Letter . According to the Python Language Reference (3.4.1), you first need to normalize the sequence of characters in the form of NFKC (which in practice means decomposing characters with diacritics and "simplifying" them, for example, changing the index 0 to normal 0 ). Then the beginning of the identifier must be either an underscore or a letter (the entire group of letters plus Nl are alphabetic numbers), plus several other alphabetic characters. This becomes much more interesting when we look at characters that are allowed as an extension of an identifier. In addition, we can use: Decimal_Numbers ( Nd ), which are actually numbers from 0 to 9, but in many guises, for example MATHEMATICAL MONOSPACE DIGIT NINE , which is the \U0001D7FF (70 characters in total); most marks ( M ), with the exception of the included marks ( Me ) - here we have all diacritics (accents); all characters from the Pc subgroup are punctuation marks, therefore not only underscore, but also various connections (10 characters); some additional numeric characters (for example, Ethiopian digits from 0 to 9); midpoints (2 characters).

As can be seen from the above, N with index 0 should be taken as an identifier. When I tried to paste it from Word, both IDLE and Wing 101 inserted normalized forms into the editor (i.e. N0 ). I suspect that the author of the question tried to use a subscript character that could not be properly normalized.

0
source share

All Articles