Conduit: memory leak

Working on some observations on a previous question ( haskell-data-hashset-from-unordered-container-performance-for-large-sets ) I came across a weird memory leak

module Main where

import System.Environment (getArgs)
import Control.Monad.Trans.Resource (runResourceT)
import Data.Attoparsec.ByteString (sepBy, Parser)
import Data.Attoparsec.ByteString.Char8 (decimal, char)
import Data.Conduit
import qualified Data.Conduit.Attoparsec as CA
import qualified Data.Conduit.Binary as CB
import qualified Data.Conduit.List as CL

main :: IO ()
main = do (args:_) <- getArgs
          writeFile "input.txt" $ unlines $ map show [1..4 :: Int]
          case args of "list" -> m1
                       "fail" -> m2
                       "listlist" -> m3
                       "memoryleak" -> m4
                       --UPDATE
                       "bs-lines":_ -> m5
                       "bs":_ -> m6
                       _ -> putStr $ unlines ["Usage: conduit list"
                                             ,"               fail"
                                             ,"               listlist"
                                             ,"               memoryleak"
                                             --UPDATE
                                             ,"               bs-lines"
                                             ,"               bs"
                                             ]
m1,m2,m3,m4 :: IO ()
m1 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CB.lines
           =$= CA.conduitParser (decimal :: Parser Int)
           =$= CL.map snd
           =$= CL.consume
        print hs
m2 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CA.conduitParser (decimal :: Parser Int)
           =$= CL.map snd
           =$= CL.consume
        print hs
m3 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CB.lines
           =$= CA.conduitParser (decimal `sepBy` (char '\n') :: Parser [Int])
           =$= CL.map snd
           =$= CL.consume
        print hs
m4 = do hs <- runResourceT
            $  CB.sourceFile "input.txt"
            $$ CA.conduitParser (decimal `sepBy` (char '\n') :: Parser [Int])
           =$= CL.map snd
           =$= CL.consume
        print hs
-- UPDATE
m5 = do inpt <- BS.lines <$> BS.readFile "input.txt"
        let Right hs =  mapM (parseOnly (decimal :: Parser Int)) inpt
        print hs
m6 = do inpt <- BS.readFile "input.txt"
        let Right hs =  (parseOnly (decimal `sepBy` (char '\n') :: Parser [Int])) inpt
        print hs

Here is an example output:

$ > stack exec -- example list
[1234]
$ > stack exec -- example listlist
[[1234]]
$ > stack exec -- conduit fail
conduit: ParseError {errorContexts = [], errorMessage = "Failed reading: takeWhile1", errorPosition = 1:2}
$ > stack exec -- example memoryleak
(Ctrl+C)

-- UPDATE
$ > stack exec -- example bs-lines
[1,2,3,4]
$ > stack exec -- example bs
[1,2,3,4]

Now I have questions:

  • Why m1does not it create [1,2,3,4]?
  • Why m2doesn’t work?
  • Why m4does it behave quite differently compared to all other versions and creates a space leak?
+4
source share
2 answers

Why m2 does not work?

Input file as character stream:

1\n2\n3\n4\n

decimal , :

\n2\n3\n4\n

, conduitParser , , .

m4 - ?

decimal `sepBy` (char '\n') \n , :

\n

decimal `sepBy` (char '\n') , , : sepBy . .

m1 [1,2,3,4]?

! , - , , , .

+4

m1: CB.lines, , :

["1\n2\n3\n4\n"]

["1", "2", "3", "4"]

attoparsec "1", , "2" ..

+2