What is needed for simple text processing in Haskell?

I am trying to do some simple text processing in Haskell, and I am wondering what is best for me to do this in FP. I looked at the parsec module, but it seems a lot more complicated than what I am looking for as a new Haskeller. What would be the best way to remove all punctuation from the body of the text? My naive approach was to make a function like this:

removePunc str = [c | c <- str, c /= '.', c /= '?', c /= '.', c /= '!', c /= '-', c /= ';', c /= '\'', c /= '\"',] 
+8
haskell nlp
source share
3 answers

You can simply write your code:

 removePunc = filter (`notElem` ".?!-;\'\"") 

or

 removePunc = filter (flip notElem ".?!-;\'\"") 
+8
source share

Perhaps a more efficient method (O (log n), not O (n)) should use a Set (from Data.Set ):

 import qualified Data.Set as S punctuation = S.fromList ",?,-;'\"" removePunc = filter (`S.notMember` punctuation) 

You have to build the set outside the function, so that it will be calculated only once (by sharing across all calls), since the overhead of creating the set is much more than the simple notElem time test that it has suggested.

Note: this is such a small situation that the additional Set overhead can reduce the asymptotic advantages of the set compared to the list, so if you are looking for absolute performance, this should be profiled.

+11
source share

You can group characters in a string and use notElem:

 [c | c <- str, c `notElem` ".?!,-;"] 

or in a more functional style:

 filter (\c -> c `notElem` ".?!,") str 
+4
source share

All Articles