Proper use of ReadP in Haskell

I made a very simple parser for lists of numbers in a file using ReadP in Haskell. It works, but it is very slow ... is this normal behavior of this type of parser or am I something wrong?

import Text.ParserCombinators.ReadP import qualified Data.IntSet as IntSet import Data.Char setsReader :: ReadP [ IntSet.IntSet ] setsReader = setReader `sepBy` ( char '\n' ) innocentWhitespace :: ReadP () innocentWhitespace = skipMany $ (char ' ') <++ (char '\t' ) setReader :: ReadP IntSet.IntSet setReader = do innocentWhitespace int_list <- integerReader `sepBy1` innocentWhitespace innocentWhitespace return $ IntSet.fromList int_list integerReader :: ReadP Int integerReader = do digits <- many1 $ satisfy isDigit return $ read digits readClusters:: String -> IO [ IntSet.IntSet ] readClusters filename = do whole_file <- readFile filename return $ ( fst . last ) $ readP_to_S setsReader whole_file 
+6
parsing haskell
source share
1 answer

setReader has exponential behavior because it allows you to eliminate spaces between numbers. So for the line:

 12 34 56 

He sees these analyzes:

 [1,2,3,4,5,6] [12,3,4,5,6] [1,2,34,5,6] [12,34,5,6] [1,2,3,4,56] [12,3,4,56] [1,2,34,56] [12,34,56] 

You could see how this could fail for long lines. ReadP returns all valid parses in increasing order of length, so you must go through all these intermediate paragraphs to get to the last parsing. Change:

 int_list <- integerReader `sepBy1` innocentWhitespace 

To:

 int_list <- integerReader `sepBy1` mandatoryWhitespace 

For a suitable definition of mandatoryWhitespace for squash this exponential behavior. The parsec strategy used by parsec is more resistant to this error because it is greedy - when it consumes input in this branch, it is bound to this branch and never returns (unless you explicitly set it). Therefore, once he has correctly analyzed 12 , he will never return to analysis 1 2 . Of course, this means that it is important in which order you set out your choice, which always seems a little sick to me to think.

Also I would use:

 head [ x | (x,"") <- readP_to_S setsReader whole_file ] 

To extract valid parsing of an entire file if it consumes the entire input very quickly, but there are one hundred basic ways of interpreting this input. If you don’t care about the ambiguity, you are more likely to return the first than the last, because the first will be faster.

+12
source share

All Articles