Why does the so-called haskell interception counter method often call sigprocmask?

REVISED SUMMARY

Well, it seems that system calls are definitely GC related, and the main problem is that GC happens too often. This seems to be related to using splitWhen and pack, as far as I can tell by profiling.

The splitWhen implementation converts each fragment from lazy to strong text and combines them all as it creates a chunk buffer. This is bound to set aside a lot.

pack, since it converts one type to another, should stand out, and this is in my inner loop, which also makes sense.

ORIGINAL ISSUE

I stumbled upon some unexpected syscall activity in haskell-based IQ based enumerator. Hoping someone can shed light on him.

I play with the haskell perc script quick access version that I once wrote for several months, on and off. The script reads in some json from each line and then prints a specific field if it exists.

Here is the perl version and how I run it.

cat ~/sample_input | perl -lpe '($_) = grep(/type/, split(/,/))' > /dev/null 

Here is the haskell version (it is called similarly to the perl version).

 {-# LANGUAGE OverloadedStrings #-} import qualified Data.Enumerator as E import qualified Data.Enumerator.Internal as EI import qualified Data.Enumerator.Text as ET import qualified Data.Enumerator.List as EL import qualified Data.Text as T import qualified Data.Text.IO as TI import Data.Functor import Control.Monad import qualified Data.Text.Lazy as TL import qualified Data.Text.Lazy.IO as TLI import System.Environment import System.IO (stdin, stdout) import GHC.IO.Handle (hSetBuffering, BufferMode(BlockBuffering)) fieldEnumerator field = enumStdin E.$= splitOn [',','\n'] E.$= grabField field enumStdin = ET.enumHandle stdin splitOn :: [Char] -> EI.Enumeratee T.Text T.Text IO b splitOn chars = (ET.splitWhen (`elem` chars)) grabField :: String -> EI.Enumeratee T.Text T.Text IO b grabField = EL.filter . T.isInfixOf . T.pack intercalateNewlines = EL.mapM_ (\field -> (TI.putStrLn field >> (putStr "\n\n"))) runE enum = E.run_ $ enum E.$$ intercalateNewlines main = do (field:_) <- getArgs runE $ fieldEnumerator field 

Surprisingly, the haskell version trace looks something like this (the actual JSON is suppressed because it is data from work), while the perl version does what I expect; a bunch of reads followed by a record is repeated.

 55333/0x8816f5: 366125 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 366136 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 367209 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 367218 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 368449 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 368458 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 369525 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 369534 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 370610 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 370620 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 371735 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 371744 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 371798 5 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 371802 3 1 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 372907 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 372918 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 374063 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 374072 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 375147 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 375156 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 376283 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 376292 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 376809 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 376814 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 377378 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 377387 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 378537 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 378546 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 379598 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 379604 3 0 sigreturn(0x7FFF5FBFF9A0, 0x1E, 0x1) = 0 Err#-2 55333/0x8816f5: 379613 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 380667 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 380678 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 381862 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 381871 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 382032 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 382036 4 2 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 383064 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 383073 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 384118 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 384127 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 385206 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 385215 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 386348 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 386358 3 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 387468 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 387477 11 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 387614 6 2 select(0x1, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 387620 5 3 read(0x0, SOME_JSON, 0x1FA0) = 8096 0 55333/0x8816f5: 388597 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 388606 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 389707 3 0 sigprocmask(0x1, 0x10069BFA8, 0x10069BFAC) = 0x0 0 55333/0x8816f5: 389716 2 0 sigprocmask(0x3, 0x10069BFAC, 0x0) = 0x0 0 55333/0x8816f5: 390261 7 3 select(0x2, 0x7FFF5FBFBA70, 0x7FFF5FBFB9F0, 0x0, 0x7FFF5FBFBAF0) = 1 0 55333/0x8816f5: 390273 6 3 write(0x1, SOME_OUTPUT, 0x1FA0) = 8096 0 
+7
source share
4 answers

Are you worried about distribution or call (overhead?) At sigprocmask?

If this is the first one and you want to use the enumerator package, this small change will help set the 4k test at about 50%: 8 MB of distributions are reduced to 4 MB, and gen0 GC from 15 to 6.

 splitOn :: EI.Enumeratee T.Text T.Text IO b splitOn = EL.concatMap (T.split fastSplit) fastSplit :: Char -> Bool fastSplit ',' = True fastSplit '\n' = True fastSplit _ = False 

To (statistics from +RTS -sstderr -RTS ):

  8,212,680 bytes allocated in the heap
          696,184 bytes copied during GC
          148,656 bytes maximum residency (1 sample (s))
           30,664 bytes maximum slop
                2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed) Avg pause Max pause
   Gen 0 15 colls, 0 par 0.00s 0.00s 0.0001s 0.0005s
   Gen 1 1 colls, 0 par 0.00s 0.00s 0.0010s 0.0010s

After:

  3,838,048 bytes allocated in the heap
          689,592 bytes copied during GC
          148,368 bytes maximum residency (1 sample (s))
           27,040 bytes maximum slop
                2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed) Avg pause Max pause
   Gen 0 6 colls, 0 par 0.00s 0.00s 0.0001s 0.0003s
   Gen 1 1 colls, 0 par 0.00s 0.00s 0.0006s 0.0006s

This is quite a reasonable improvement, but, of course, leaves much to be desired. Instead of listing too many with my feet, I took a hit, rewriting it into pipeline-0.4.1 for kicks only. It should be equivalent ...

 import Data.Conduit as C import qualified Data.Conduit.Binary as Cb import qualified Data.Conduit.List as Cl import qualified Data.Conduit.Text as Ct import qualified Data.Text as T import qualified Data.Text.IO as TI import Control.Monad.Trans (MonadIO, liftIO) import System.Environment import System.IO (stdin) grabField :: Monad m => String -> Conduit T.Text m T.Text grabField = Cl.filter . T.isInfixOf . T.pack printField :: MonadIO m => T.Text -> m () printField field = liftIO $ do TI.putStrLn field putStr "\n\n" fastSplit :: Char -> Bool fastSplit ',' = True fastSplit '\n' = True fastSplit _ = False main :: IO () main = do field:_ <- getArgs runResourceT $ Cb.sourceHandle stdin $$ Ct.decode Ct.utf8 =$ Cl.concatMap (T.split fastSplit) =$ grabField field =$ Cl.mapM_ printField 

... but for some reason allocates and saves less memory:

  835,688 bytes allocated in the heap
            8,576 bytes copied during GC
           87,200 bytes maximum residency (1 sample (s))
           19,968 bytes maximum slop
                1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed) Avg pause Max pause
   Gen 0 1 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
   Gen 1 1 colls, 0 par 0.00s 0.00s 0.0008s 0.0008s
+7
source

Moving this to the top level from the comments:

FWIW, I look at the runtime (we also discuss this in IRC), and there are only two uses of sigprocmask: GC and tty driver. The latter is unlikely, I recommended profiling to verify that he does a lot of GC and try to find out why.

And it turns out (from IRC) that it makes 90 MB of distribution for 0.5 MB of data, and the garbage collector does run quite a lot. So now this is because the enumerator makes so much extra allocation.

+7
source

If the amount of data read between these sigetsmacs is large, the first assumption from the top of my head is that the runtime executes sigsetmask before gc starts, so gc does not interrupt the heap to an inconsistent state.

+4
source

More than a comment and less answer: if you grep through the GHC source, you will see that posix/TTY.c (code TERMIOS) and sm/GC.c (via {,un}blockUserSignals ) have the most likely candidates. You can compile the GHC using debugging symbols or simply enter some dummy (unique) system calls to make sure that you can distinguish between two system call profiles to find out. Another cheap test would be to remove any terminal interactions, and if the masking behavior disappears, this will be soft evidence of GC support (no answer).

EDIT: I have to admit that some library code might call sigprocmask too, I ignored this as a less likely source, but it could actually be a problem!

+3
source

All Articles