Incomplete input issue when using Attoparsec

I am converting some valid Haskell code that uses Parsec, instead uses Attoparsec in the hope of getting better performance. I made changes and everything compiles, but my parser is not working correctly.

I am parsing a file consisting of different types of records, one per line. Each of my individual functions for parsing a record or comment works correctly, but when I try to write a function to compile a sequence of records, the parser always returns a partial result because it expects more input.

These are the two main options I've tried. Both problems are the same.

items :: Parser [Item] items = sepBy (comment <|> recordType1 <|> recordType2) endOfLine 

For this second, I changed the post / comment parser to use line breaks.

 items :: Parser [Item] items = manyTill (comment <|> recordType1 <|> recordType2) endOfInput 

Is there something wrong with my approach? Is there any other way to achieve what I'm trying to do?

+7
haskell attoparsec
source share
3 answers

I ran into this problem before and realize that it is caused by how <|> works in the definition of sepBy :

 sepBy1 :: Alternative f => fa -> fs -> f [a] sepBy1 ps = scan where scan = liftA2 (:) p ((s *> scan) <|> pure []) 

This will only move to pure [] after (s *> scan) fails, which is not because you are at the end of the input.

My solution was to call feed using empty ByteString on the Result returned by parse . This may be a kind of hack, but it also seems that attoparsec-iteratee is dealing with a problem:

 fk (EOF Nothing) = finalChunk $ feed (k S.empty) S.empty 

As far as I can tell, this is the only reason attoparsec-iteratee works, and the plain old parse does not work.

+2
source share

If you write an attoparsec parser that consumes as much input as possible before the crash, you must report the partial result to continue when you reach the end of your input.

+5
source share

You give quite a bit of information, so I think it's hard for you to help. However, there are a few comments that I would like to give:

  • Perhaps the parser does not understand that the input is made, and it depends on receiving an EOL or receiving another entry. Therefore, it requests a partial result. Try feeding him the equivalent of EOL in the hope that he will force him.
  • I don’t remember the code, but using an alternate instance can be detrimental to performance analysis. If so, you might want to write a comment and recordTypes.
  • I use grain for large binary parsing, and it is also very fast. attoparsec seems better as a text parser. You should definitely consider this option.
  • Another option is to use IOs based on iteration in the longer term. John Lato made an excellent article on iterations in the last reader of the monad (question number 16, I think). The final condition is an iteration for the signal. Remember that iteration types are quite complex and take some time to get used to.
0
source share

All Articles