First, completely unscientific benchmarking will begin first:
All programs were compiled with the default optimization level (-O3 for gcc, -O2 for GHC) and run with
time ./prog > outfile
As a baseline, program C took 1.07 seconds to create a ~ 76 MB file (78888897 bytes), approximately 70 MB / s.
- The "naive" Haskell program (
forM [1 .. 10000000] $ \j -> putStrLn (show j) ) took 8.64 s, about 8.8 MB / s. - The same thing with
forM_ instead of forM took 5.64s, about 13.5MB / s. - The
ByteString version from ByteString 's answer took 9.13 s, about 8.3 MB / s. - The
Text version from dflemstr's answer took 5.64 s, about 13.5 MB / s. - The
Vector version of the question took 5.54s, about 13.7MB / s. main = mapM_ (C.putStrLn . C.pack . show) $ [1 :: Int .. 10000000] , where C is Data.ByteString.Char8 , it took 4.25s, about 17.9MB / s.putStr . unlines . map show $ [1 :: Int .. 10000000] putStr . unlines . map show $ [1 :: Int .. 10000000] took 3.06s, about 24.8MB / s.Hand loop
main = putStr $ go 1 where go :: Int -> String go i | i > 10000000 = "" | otherwise = shows i . showChar '\n' $ go (i+1)
- 2.32 s, about 32.75 MB / s.
main = putStrLn $ replicate 78888896 'a' took 1.15 s, about 66 MB / s.main = C.putStrLn $ C.replicate 78888896 'a' , where C is Data.ByteString.Char8 , took 0.143 s, about 530 MB / s, about the same numbers for lazy ByteString s.
What can we learn from this?
First, do not use forM or mapM unless you really want to collect the results. Damage, it sucks.
Then the ByteString output can be very fast (10.), but if building a ByteString for the output is slow (3.), you will get a slower code than the naive String output.
What is so terrible in 3.? Well, all String involved are very short. So you get a list
Chunk "1234567" Empty
and between any two of these, and Chunk "\n" Empty is placed, the resulting list is combined, which means that all these Empty discarded during construction ... (Chunk "1234567" (Chunk "\n" (Chunk "1234568" (...)))) . That there is a lot of wasteful construction-deconstruction-reconstruction. A speed comparable to the Text version and the fixed "naive" String can be achieved using pack to strict ByteString and using fromChunks (and Data.List.intersperse for Data.List.intersperse ). Better performance, slightly better than 6., can be obtained by eliminating costly singletones. If you stick newline characters to String s using \k -> shows k "\n" instead of show , the concatenation should be half the length slightly larger than the ByteString s that is being calculated.
I am not familiar enough with inner texts or vectors to offer a more than semi-educated assumption about the reasons for the observed performance, so I will not leave them alone. Suffice it to say that performance enhancement is, at best, minimal compared to the fixed, naive version of String .
Now, 6. shows that the output of ByteString faster than String , enough that in this case the extra work of pack ing is more than compensated. However, do not be fooled by this in order to believe that this is always so. If the String package is long, packaging may take longer than the String output.
But ten million putStrLn calls, whether it is a String or ByteString , take a lot of time. Itβs faster to grab stdout Handle only once and build String output in code other than IO. unlines already working well, but we still suffer from the map show [1 .. 10^7] list design map show [1 .. 10^7] . Unfortunately, the compiler could not fix it (but it fixed [1 .. 10^7] , which is already very good). So do it yourself, bringing it to 8. It's not too scary, but it still takes more than two times more than program C.
You can make a faster Haskell program by going through the low-level and direct filling of a ByteString , not going through the String through show , but I donβt know if the speed C is available. Anyway, this low-level code is not very beautiful, so I will spare you what I have, but sometimes I have to deal with dirty hands, if speed matters.