How to make python style indent / dedent markers with alex / haskell?

I am writing lexer for a small language in Alex with Haskell.

The language is set to have an indented pythonesque with an INDENT marker or a DEDENT marker emitted whenever the indent level changes.

In a traditional imperative language such as C, you would save the global value in the lexer and update it with the level of indentation in each line.

This does not work in Alex / Haskell, because I cannot store any global data anywhere with Haskell, and I cannot put all of my vocabulary rules into any monad or anything else.

So how can I do this? Is it possible? Or will I have to write my own lexer and avoid using alex?

+6
source share
2 answers

Note that in other space-sensitive languages ​​- for example, Haskell - layout processing is actually done in the lexer. GHC actually implements layout processing in Alex. Here's the source:

https://github.com/ghc/ghc/blob/master/compiler/parser/Lexer.x

There are some serious errors in your question that lead you astray, as grockway notes. “I cannot store any global data anywhere with Haskell” in the wrong way. Firstly, you can have a global state, and secondly, you should not use a global state here when Alex fully supports state transitions in rules in a safe manner.

Take a look at the AlexState structure that Alex provides, and you can stream through your lexer. Then see how this state is used in the GHC layout implementation to implement indentation / unindent layout rules. (Find “Layout Handling” in the GHC vocabulary to see how the state is pushed and pushed).

+11
source

I cannot store global data anywhere using Haskell

This is not true; in most cases, something like State monad is enough , but there is also ST monad .

You do not need a global state for this task. Writing a parser consists of two parts; lexical analysis and parsing. Lexical analysis simply turns a stream of characters into a stream of meaningful tokens. Syntax analysis turns tokens into AST; here you have to deal with indentation.

When you interpret indentation, you call the handler function when the indentation level changes - when it increases (nesting), you call the handler function (possibly with one arg increment) if you want to track the indentation level); when the level decreases, you simply return the corresponding part of the AST from the function.

(As an aside, using a global variable for this is something that would not be imperative for me - if you like, it is an instance variable. The state monad is very similar to the concept of this.)

Finally, I think that the phrase "I can’t put all my vocabulary rules inside any monad" means some kind of misunderstanding of the monads. If I needed to parse and save the global state, my code would look like this:

data AST = ... type Step = State Int AST parseFunction :: Stream -> Step parseFunction s = do level <- get ... if anotherFunction then put (level + 1) >> parseFunction ... else parseWhatever ... return node parse :: Stream -> Step parse s = do if looksLikeFunction then parseFunction ... main = runState parse 0 -- initial nesting of 0 

Instead of combining functional applications with (.) Or ($) you combine them with (>>=) or (>>) . In addition, the algorithm is the same. (There is no "monad" to be "inside.")

Finally, you may need applicative functors:

 eval :: Environment -> Node -> Evaluated eval e (Constant x) = Evaluated x eval e (Variable x) = Evaluated (lookup ex) eval e (Function fxy) = (f <$> (`eval` x) <*> (`eval` y)) e 

(or

 eval e (Function fxy) = ((`eval` f) <*> (`eval` x) <*> (`eval` y)) e 

If you have something like "funcall" ... but I'm distracted.)

There is a wealth of parsing literature with applicative functors, monads, and arrows; all of which have the potential to solve your problem. Read them and see what you get.

+5
source

All Articles