There are two steps to parsing a text input stream for parsing:
Lexical analysis: Here your input stream is divided into lexical units. It looks at a sequence of characters and generates tokens (similar to a word in an oral or written language). Finite state machines are very good at lexical analysis if you have made a good design decision regarding the lexical structure. From your data above, individual tokens will be like your keywords (for example, "global"), identifiers (for example, "bitwise", "SOURCES"), symbolic characters (for example, "{" "}", "." , "/"), numeric values, escape values ββ(for example, "\ n"), etc.
Parsing / grammar analysis: After generating a sequence of tokens (or perhaps while you are doing this), you should be able to analyze the structure to determine if the token sequence matches your language. To do this, you generally need some kind of parser, although if the structure of the language is not very complicated, you can do this using a state machine. In general (and since you want nested structures to be in your case in particular), you will need to use one of the methods that Ken Bloom describes.
Therefore, in response to your questions:
Should I use an enumeration or abstract class + derivatives for my states?
I found that for small tokenizers, a matrix of state / transition values ββis suitable, something like next_state = state_transitions[current_state][current_input_char] . In this case, next_state and current_state are some integer types (including, possibly, a numbered type). Input errors are detected when switching to an invalid state. The end of the token is identified based on the state identifier of the actual finite elements without a valid transition available to another state, taking into account the next input symbol. If space bothers you, you can use a map vector instead. Creating state classes is possible, but I think it's probably harder to do something than you need.
When reading the list, I need NOT to ignore \ n.
You can either create a token called "\ n", or a more generalized exit marker (identifier preceded by a backslash. If you are talking about identifying line breaks in the source, then these are just the characters you need to create transitions for your transition matrix statuses (note the difference between line breaks on Unix and Windows, however you can create an FSM that runs on them).
Will simple enumeration states be sufficient for multilevel parsing (scope within the scope ... {...} ...}), or will this require hacker implementations?
Here you will need a grammar machine or pushdown if you cannot guarantee that the nesting will not exceed a certain level. Even then, this is likely to make your FSM very complex.
Here in the project it is indicated what I mean: ...
See my comments on lexical and grammar analysis above.