Whenever I consider learning a new language - haskell in this case, I try to hack a primitive grep clone to understand how good the implementation of the language and / or its library is in text processing, because this is the main use case for me.
Inspired by the code on the haskell wiki page , I came up with the following naive attempt:
{-# LANGUAGE FlexibleContexts, ExistentialQuantification #-} import Text.Regex.PCRE import System.Environment io :: ([String] -> [String]) -> IO () io f = interact (unlines . f . lines) regexBool :: forall rl . (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) => r -> l -> Bool regexBool rl = l =~ r :: Bool grep :: forall rl . (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) => r -> [l] -> [l] grep r = filter (regexBool r) main :: IO () main = do argv <- getArgs io $ grep $ argv !! 0
This is similar to what I want, but unfortunately it is very slow - about 10 times slower than the python script does the same. I assume that this is not a regular expression library error, because it calls in PCRE, which should be fast enough (switching to Text.Regex.Posix slows down the situation a bit more). So it should be a String implementation that is instructive from a theoretical point of view, but ineffective according to what I read.
Is there an alternative to String in haskell that is efficient and convenient (i.e., has little or no friction when switching to using String s instead) and that fully and correctly processes Unicode with UTF-8 encoding, as well as other encodings without too much trouble , if possible? Something that everyone uses when processing text in haskell, but which I just donβt know about, because I'm a complete newbie?
source share