Primitive but effective clone of grep in haskell?

Question

Primitive but effective clone of grep in haskell?

Whenever I consider learning a new language - haskell in this case, I try to hack a primitive grep clone to understand how good the implementation of the language and / or its library is in text processing, because this is the main use case for me.

Inspired by the code on the haskell wiki page , I came up with the following naive attempt:

{-# LANGUAGE FlexibleContexts, ExistentialQuantification #-} import Text.Regex.PCRE import System.Environment io :: ([String] -> [String]) -> IO () io f = interact (unlines . f . lines) regexBool :: forall rl . (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) => r -> l -> Bool regexBool rl = l =~ r :: Bool grep :: forall rl . (RegexMaker Regex CompOption ExecOption r, RegexLike Regex l) => r -> [l] -> [l] grep r = filter (regexBool r) main :: IO () main = do argv <- getArgs io $ grep $ argv !! 0

This is similar to what I want, but unfortunately it is very slow - about 10 times slower than the python script does the same. I assume that this is not a regular expression library error, because it calls in PCRE, which should be fast enough (switching to Text.Regex.Posix slows down the situation a bit more). So it should be a String implementation that is instructive from a theoretical point of view, but ineffective according to what I read.

Is there an alternative to String in haskell that is efficient and convenient (i.e., has little or no friction when switching to using String s instead) and that fully and correctly processes Unicode with UTF-8 encoding, as well as other encodings without too much trouble , if possible? Something that everyone uses when processing text in haskell, but which I just don’t know about, because I'm a complete newbie?

+6

regex grep haskell

dlukes Jul 30 '16 at 15:48

source share

1 answer

runeks · Answer 1 · 2016-07-31T11:09:26+0000

It is possible that slow speed is caused by using the standard type of library list. I often encountered performance issues with it in the past.

It would be nice to comment on your executable to find out where it spends its time: Haskell performance analysis tools . Profiling Haskell programs is very simple (compile using the switch and run your program with the argument added, and the report will be written to a text file in the current working directory).

As a side note, I use the same approach as when learning a new language: create something that works. My experience with Haskell is that I can easily get an order or two in performance by profiling and making relatively simple changes (usually a few lines).

Primitive but effective clone of grep in haskell?

More articles: