How to write a parser using Parsec, which accepts only unique elements?

I recently started learning Haskell and tried my hand at Parsec. However, over the past couple of days I have had a problem due to which I could not find a solution. So I'm trying to write a parser that can parse a string like this:

<"apple", "pear", "pineapple", "orange"> 

The code I wrote for this:

 collection :: Parser [String] collection = (char '<') *> (string `sepBy` char ',')) <* (char '>') string :: Parser String string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"' 

This works fine for me as it is able to parse the string I defined above. However, now I would like to ensure compliance with the rule that each element of this collection should be unique, and that is where I have problems. One of the first results that I found when searching the Internet was this one , which suggests using the nub function. Although the problem stated in this question is not the same, it will theoretically solve my problem. But I do not understand how I can apply this function in Parser. I tried to add the nub function to several parts of the above code without any success. Later, I also tried to do this as follows:

  collection :: Parser [String] collection = do char '<' value <- (string `sepBy` char ',')) char '>' return nub value 

But this does not work, as the type does not match the expected nub , which I believe is one of the problems I'm struggling with. I'm also not quite sure that nub is the right way. My fear is that I am going in the wrong direction and that I will not be able to solve my problem like this. Perhaps something is missing me? Any advice or help anyone could provide would be greatly appreciated.

+6
source share
1 answer

The Parsec Parser type is an instance of MonadPlus , which means that we can always crash (i.e. cause a parsing error) whenever we want. A convenient function for this is guard :

 guard :: MonadPlus m => Bool -> m () 

This function takes a boolean value. If it is true, it returns () , and the whole calculation (parsing in this case) is not interrupted. If this is false, all of this fails.

So, as long as you do not care about efficiency, a reasonable approach: analyze the entire list, check if all elements are unique and fail if they are not.

To do this, the first thing we need to do is write a predicate that checks whether each item in the list is unique. nub not entirely correct: it returns a list with all duplicates retrieved. But if we don’t really care about performance, we can use it to check:

 allUnique ls = length (nub ls) == length ls 

Using this predicate, we can write a unique function that wraps any parser that creates a list and ensures that the list is unique:

 unique parser = do res <- parser guard (allUnique res) return res 

Again, if guard gives True , this does not affect the rest of the parsing. But if he gave False , it will cause an error.

Here is how we can use it:

 Ξ»> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">" Right ["apple","pear","pineapple","orange"] Ξ»> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">" Left "<interactive>" (line 1, column 46):unknown parse error 

It does what you want. However, there is a problem: there is no error message in the message. This is not very convenient! Fortunately, we can fix this using <?> . This is a statement provided by Parsec that allows us to set the parser error message.

 unique parser = do res <- parser guard (allUnique res) <?> "unique elements" return res 

Ahhh, much better:

 Ξ»> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">" Left "<interactive>" (line 1, column 46): expecting unique elements 

All this works, but, again, it is worth noting that it is ineffective. It parses the entire list before the items are not unique and nub takes a quadratic time. However, this works, and it is probably more than enough to parse small and medium-sized files: that is, most things are hand-written, not auto-generated.

+6
source

All Articles