There is no fundamental reason, except for the decisions of the language developers and the history of single-point identifiers. Some languages โโreally allow the use of identifiers of several tokens: the expression language MultiMedia Fusions, some software for working with spreadsheets or Mac laptops, whose name eludes me, and I'm sure of others. However, there are several considerations that make the problem non-trivial.
Assuming the language is free form, you need a canonical representation, so that an identifier like account name treated equally regardless of the space. The compiler may need to use some usage conventions to please the linker. Then you should consider the effect of this on external exports - why C ++ has an extern "C" binding specifier to disable mangling.
Keywords are a problem, as you saw. Most C-family languages โโhave a lexical keyword class that is different from identifiers that are not context-sensitive. You cannot name a class variable in C ++. This can be solved by abandoning keywords in identifiers with several tokens:
if account age < 13 then child account = true;
Here, if and then cannot be part of the identifier, so there is no ambiguity with account age and child account . In addition, you can demand punctuation everywhere:
if (account age < 13) { child account = true; }
The final option is to make keywords ubiquitously context-sensitive, which leads to monsters like:
IF IF = THEN THEN ELSE = THEN ELSE THEN = ELSE
The biggest problem is that matching is an extremely powerful syntax construct, and you don't want to take it easy. Resolving identifiers with multiple tokens prevents matching for other purposes, such as an application or a composite function. I think it is much better to allow most non-white characters and thus allow identifiers like canonical-venomous-frobnicator . Still much readable, but with less scope for ambiguity.
Jon purdy
source share