Firstly, one thing about your list of components does not make sense. The construction of AST is (to a large extent) of parsing, so it either should not be there, or at least should be before AST.
You have a lexer. All this gives you individual tokens. In any case, you will need the actual parser, because ordinary languages โโare not interesting for programming. You cannot even (correctly) express expressions. Hell, you can't handle operator priority. Token Token does not give you:
- The idea in which statements and expressions begin and end.
- The idea is how instructions are grouped in blocks.
- Idea What part of the expression has priority, associativity, etc.
- Clear, uncluttered representation in the real structure of the program.
- A structure that can be passed through many transformations, without any passage, knowing and having code to place that condition in
if enclosed in parentheses. - ... in a more general sense, any understanding is above the level of one token.
Suppose you have two passes in your compiler that optimize certain types of operators for certain arguments (say, constant folding and algebraic simplifications like x - x -> 0 ). If you pass these tokens for the expression x - x * 1 , these passages will be cluttered to find out that the first part is x * 1 . And they must know this so that the transformation is not wrong (consider 1 + 2 * 3 ).
These things are complicated enough to be eligible, as it is, so you donโt want to be bothered with parsing issues. That is why you first solve the problem of parsing, at a separate parsing stage. Then you can, say, replace the function call with your definition, without worrying about adding parentheses so that the meaning remains the same. You save time, you share problems, avoid repetition, you include simpler code in many other places, etc.
The parser shows all this and builds the AST, which therefore contains all this information. Without any additional data about the nodes, the AST form itself does not. 1, 2, 3 and more, for free. None of the subsequent basillions to follow should worry about this anymore.
This does not mean that you should always have AST. For fairly simple languages, you can make a one-pass compiler. Instead of generating an AST or some other intermediate representation during parsing, you emit code as you go. However, it becomes more difficult for less simple languages, and you cannot intelligently do many things (for example, 70% of all optimizations and diagnostics), and yes, I just made this number up). In general, I would not advise you to do this. There are good reasons that single-pass compilers are mostly dead. Even languages โโthat allow them (such as C) are currently implemented with multiple passes and AST. This is an easy way to get started, but it will severely limit you (and the language if you run it).