Valid characters for lisp characters

First of all, since I understand that variable identifiers are called common lisp characters.

I noticed that while languages ​​such as C variable identifiers can only have alphanumeric and underscore characters, Common Lisp allows you to use many more characters like "*" and (at least the scheme) "?"

So, I want to know: what is the complete character set that Common Lisp allows you to have in a character (or a variable identifier if I am mistaken)? Is this the same for the circuit?

Also, is there a character set for function names?

I work at Google, look at CLHS and Practical General Lisp, and for me to live, something must be wrong because I cannot find the answer.

+4
source share
3 answers

The detailed answer is a bit complicated. Common Lisp has an ANSI standard. It defines the set of available characters . Basically you can use all of these characters for characters. See Also Symbols as tokens .

for instance

|Polynom 2 * x ** 3 - 5 * x ** 2 + 10| 

is a valid character. Note that the vertical columns mark the character and do not belong to the character name.

Then there are existing implementations of Common Lisp and their support for various character sets and string types. Therefore, several support Unicode (or similar) and allow Unicode characters in character names.

LispWorks:

 CL-USER 1 > (list 'δ 'ψ 'σ) (δ ψ σ) 
+7
source

[From the point of view of the scheme. Although some concepts in Schema and Common Lisp have the same name, this does not mean that the mean is the same in two languages.]

First of all, note that symbols and identifiers are two different things.

Characters can be thought of as strings that support quick equality comparisons. Two characters s and t are equal (more or less) if they are written the same way. Operation string=? must cross the characters in and see if they are all the same. It takes time proportional to the length of the shortest string. Characters, on the other hand, are automatically (ny runtime system) placed in a hash table (usually). Therefore symbol=? comes down to a simple mapping of pointers and thus very fast. Symbols are often used when one of C will use enumerations.

Symbols are values ​​that may be present at runtime.

Identifiers are simply the names of variables in a program.

Now, if the specified program should be represented as the value of the scheme, one option would be to use characters to represent identifiers, but this does not mean that the characters are identifiers (or vice versa). The best representation of identifiers (still in the Scheme) are syntax objects that, in addition to the identifier name, also record where the identifier was read (or constructed). Suppose you encounter an undefined variable and want to tell where the undefined variable is located in the program, then it is very convenient that the source location is part of the identifier representation.

And last but not least. What are the legal attributes of an identifier? Here's the best place to cite a chapter and version from R6RS:

4.2.4 Identifiers

Most identifiers allowed by other programming languages ​​are also acceptable for the Schema. In general, a sequence of letters, numbers, and “extended alphabetical characters” is an identifier when it begins with a character that cannot begin to represent the number of an object. In addition, +, - and ... are identifiers, as well as a sequence of letters, numbers and extended alphabetic characters starting with a two-character sequence →. Here are some examples of identifiers:

 lambda q soup list->vector + V17a <= a34kTMNs ->- the-word-recursion-has-many-meanings 

Extended alphabetic characters can be used in identifiers as if they were letters. The following are extended alphabetic characters:

 ! $ % & * + - . / : < = > ? @ ^ _ ~ 

In addition, all characters whose Unicode indices are greater than 127 and the Unicode category are Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So or Co can be used in identifiers. In addition, any character can be used in the identifier, if it is specified through. For example, the identifier is H \ x65; llo matches Hello and \ x3BB; matches identifier λ.

Any identifier can be used as a variable or as a syntax keyword (see sections 5.2 and 9.2) in the Scheme program. Any identifier can also be used as a syntax database, in which case it is a symbol (see Section 11.10).

From: http://www.r6rs.org/final/html/r6rs/r6rs-ZH-7.html#node_sec_4.2.4

+4
source

See chapter 2 of the CLHS for a detailed description of the reading algorithm. But the simple answer is that if the token is not a readmacro call (section 2.4) and is not a number or all points, it is interpreted as a symbol by default.

+2
source

All Articles