REVISED SUMMARY
Well, it seems that system calls are definitely GC related, and the main problem is that GC happens too often. This seems to be related to using splitWhen and pack, as far as I can tell by profiling.
The splitWhen implementation converts each fragment from lazy to strong text and combines them all as it creates a chunk buffer. This is bound to set aside a lot.
pack, since it converts one type to another, should stand out, and this is in my inner loop, which also makes sense.
ORIGINAL ISSUE
I stumbled upon some unexpected syscall activity in haskell-based IQ based enumerator. Hoping someone can shed light on him.
I play with the haskell perc script quick access version that I once wrote for several months, on and off. The script reads in some json from each line and then prints a specific field if it exists.
Here is the perl version and how I run it.
cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null
Here is the haskell version (it is called similarly to the perl version).
{-# LANGUAGE OverloadedStrings #-} import qualified Data.Enumerator as E import qualified Data.Enumerator.Internal as EI import qualified Data.Enumerator.Text as ET import qualified Data.Enumerator.List as EL import qualified Data.Text as T import qualified Data.Text.IO as TI import Data.Functor import Control.Monad import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.IO as TLI import System.Environment import System.IO (stdin, stdout) import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering)) fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field enumStdin = ET.enumHandle stdin splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b splitOn chars = (ET.splitWhen (`elem` chars)) grabField :: String -> EI.Enumeratee T.Text T.Text IO b grabField = EL.filter . T.isInfixOf . T.pack intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n"))) runE enum = E.run_ $ enum E.$$ intercalateNewlines main = do (field:_) <- getArgs runE $ fieldEnumerator field
Surprisingly, the haskell version trace looks something like this (the actual JSON is suppressed because it is data from work), while the perl version does what I expect; a bunch of reads followed by a record is repeated.
55333/0x8816f5: 366125 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 366136 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 367209 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 367218 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 368449 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 368458 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 369525 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 369534 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 370610 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 370620 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 371735 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 371744 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 371798 5 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 371802 3 1 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 372907 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 372918 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 374063 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 374072 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 375147 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 375156 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 376283 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 376292 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 376809 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 376814 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 377378 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 377387 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 378537 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 378546 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 379598 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 379604 3 0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1) = 0 Err