Here is a working Haskell example. Turns out there are a few tricks to find out before you can get it to work! The zero thing to do is the template: disable the terrible restriction of monomorphism, import some libraries and define some functions that are not in the libraries (but should be):
{-# LANGUAGE NoMonomorphismRestriction #-} import Control.Applicative ((<*)) import Control.Monad import Text.ParserCombinators.Parsec ensure px = guard (px) >> return x singleToken t = tokenPrim id (\pos _ _ -> incSourceColumn pos 1) (ensure (==t)) anyOf xs = choice (map singleToken xs)
Now that the null thing is done ... first we define the data type for our abstract syntax trees. We can just follow the grammar form here. However, to make it more convenient, I have adopted several grammar rules; in particular, two rules
NP => N | Det N | Det Adj N VB => V | V NP
it’s more convenient to write like this when it comes to writing a parser:
NP => N | Det (Adj | empty) N VB => V (NP | empty)
Any good parsing book will have a chapter on why such factoring is a good idea. Thus, type AST:
data Sentence = Complex NounPhrase VerbPhrase | Simple VerbPhrase data NounPhrase = Short Noun | Long Article (Maybe Adjective) Noun data VerbPhrase = VerbPhrase Verb (Maybe NounPhrase) type Noun = String type Verb = String type Article = String type Adjective = String
Then we can make our parser. This grammar follows even more carefully (factorized)! One wrinkle here is that we always want our parser to consume the whole sentence, so we must explicitly ask it to do this by requiring "eof" - or the end of the "file".
s = (liftM2 Complex np vp <|> liftM Simple vp) <* eof np = liftM Short n <|> liftM3 Long det (optionMaybe adj) n vp = liftM2 VerbPhrase v (optionMaybe np) n = anyOf ["i", "you", "bus", "cake", "bear"] v = anyOf ["hug", "love", "destroy", "am"] det = anyOf ["a", "the"] adj = anyOf ["pink", "stylish"]
The last part is the tokenizer. For this simple application, we'll just use space-based tokenization, so the built-in words function works fine. Give it a try! Download the whole file in ghci:
*Main> parse s "stdin" (words "i love the pink cake") Right (Complex (Short "i") (VerbPhrase "love" (Just (Long "the" (Just "pink") "cake")))) *Main> parse s "stdin" (words "i love pink cake") Left "stdin" (line 1, column 3): unexpected "pink" expecting end of input
Here, Right indicates successful parsing, and Left indicates an error. The column number indicated in the error is actually the number of the word where the error occurred, due to how we calculate the source positions in singleToken .