Most computer languages are not technically LL, because they are not even contextual. This will include languages that require identifier declarations (C, C ++, C #, Java, etc.), as well as languages with preprocessors and / or macro objects (e.g. C and C ++), languages with uncertainties that can only be resolved using semantic information (Perl would be the worst offender here, but C and C ++ are there too). And to spread the joy around a few more, it also includes languages that have a Python-like layout (Python, of course, as well as Haskell).
I put quotes around “technically” because there are many people who say that these exceptions are “not considered”. This is their opinion, and they have the right to it (and in any case, I refused to argue about this, although I do not share this opinion). You could, for example, exclude the preprocessor from C / C ++ by pre-processing the text before applying grammar or preprocessing languages with spaces by inserting INDENT, NEWLINE and DEDENT tokens instead of spaces, after which you could make some claims about the mystical "primary language". (This is much more difficult to apply to C ++ templates, which can only be resolved by parsing the text, so you cannot say that the extension can be done before parsing.)
The assertion that the language is not context-free because it requires the declaration of identifiers is perhaps more controversial. In some languages (not C / C ++, in which semantic analysis is necessary to avoid ambiguity), you can argue that the parse tree could be built without checking the declaration rule, which makes this rule "non-synthetic." But it remains that the declaration rule can be applied syntactically; you simply cannot do this with context-free grammar (for example, you can use Van Wijngaarden grammar ).
In any case, the overall parsing strategy is to recognize a superset of the language and then reject the inappropriate programs by subsequently analyzing the resulting parsing tree; in this case, the question arises whether the superset is LL. In my opinion, for this to be meaningful, it is necessary that every valid program can be analyzed into the correct syntax tree, which eliminates the use of semantic analysis to eliminate ambiguity.
The purpose of parsing is to create a syntax tree, and not just to recognize whether the text is valid or not. (You can skip this important fact if you look at formal textbooks in a language that tend to focus on recognition, perhaps because building syntax trees is less interesting.) Therefore, it seems reasonable to insist that the proposed grammar is actually syntactic language structure.
For example, you can recognize algebraic expressions using a simple common language:
(PREFIX* OPERAND POSTFIX*) (INFIX PREFIX* OPERAND POSTFIX*)*
where PREFIX is a set of prefix operators, and also ( , POSTFIX is a set of postfix operators (if any), as well ) , INFIX is a set of infix operators (addition, subtraction, multiplication, etc.), and OPERAND is it is an identifier or constant.
This regular expression will not correctly reject expressions with unbalanced parentheses, but we already agreed that it would be nice to recognize a superset of the language, right ?:-)
If desired, we can remove the parentheses from the PREFIX and POSTFIX sets and make OPERAND recursive. The resulting grammar is trivial LL (1) [Note 2]:
operand => IDENTIFIER | CONSTANT | '(' expression ')' term => operand | PREFIX operand | operand POSTFIX expression => term | term INFIX expression
The problem, however, is that this grammar does not capture operator precedence. He does not even try to recognize the fact that a+b*c and a*b+c are both sums, so there is a top-level operator + in both cases, and no operator comes first in the expression. (If you were analyzing APL, that was not a problem. But most languages meet the usual expectations regarding operator precedence.)
This point is important because LL grammar cannot process left-recursive statements, and you need left-recursive production to correctly parse the left-associative operator. (That is, correctly analyzing abc as ((ab)-c) , rather than (a-(bc)) , which will have a different meaning.) Again, you can argue that it is a pun, as you could process the tree parsing in order to correct associativity. This, of course, is true, but the result is that the grammar you use for parsing is different from the grammar you use to explain the syntax of the language. This may or may not bother you.
It is worth adding here that there are languages (Haskell, Prolog, possibly many others) in which you can define the operators and their priority in the program text. This, obviously, makes it impossible to create the correct syntax tree without semantic analysis, and the usual approach to the analysis of such languages is to use the simplified language of "alternating operands and operators" described above. After all operator priorities are known, presumably at the end of the analysis, all expressions are then parsed using something like the Shunting Yard algorithm to produce the correct parsing.
Suppose we dismiss the above objections and simply ask, "for which commonly used programming languages can an LL parser be used?"
To avoid cheating, we must really require that the LL analyzer capture the scan and not require a refund. If you allow arbitrary viewing and reverse tracking, then you significantly expand the scope of the analyzed languages, but I assert that you are no longer in the sphere of LL. This eliminates, for example, a subset of C ++ that does not contain templates and preprocessors, although the usual implementation of the compiler uses recursive descent partisans, since reverse tracking is required to resolve the "Most Vexing Parse" ambiguity. (In any case, you cannot delete templates with C ++, and parsing with templates is very difficult. [Note 3])
Java and Python were developed as LL (1) pseudo-parsing. I believe that Haskell falls into this category. C cannot be parsed correctly without semantic information (classic ambiguity: is there (x)*(y) listing or multiplication? - it depends on whether t211 has been programmed or declared as a variable), but I'm sure it can be captured using the recursive descent parser without backtracking. I did not look at C # in depth, but Eric Lippert's answer suggests that in some cases it requires a retreat.