Indices of all regular expression matches

I am trying to match all occurrences of a regular expression and get indexes as a result. An example from Real World Haskell says I can do

string =~ regex :: [(Int, Int)] 

However, this is violated because the regular expression library has been updated since the publication of RWH. (See All Haskell Regular Expression Matches and "= ~" raise "There is no instance for (RegexContext Regex [Char] [String])" ). What is the right way to do this?

Update:

I found matchAll that could give me what I want. I donโ€™t even know how to use it.

+7
regex haskell
source share
1 answer

The key to using matchAll is to use the :: Regex type annotation when creating regular expressions:

 import Text.Regex import Text.Regex.Base re = makeRegex "[^aeiou]" :: Regex test = matchAll re "the quick brown fox" 

Returns a list of arrays. To get a list of pairs (offset, length), simply access the first element of each array:

 import Data.Array ((!)) matches = map (!0) $ matchAll re "the quick brown fox" -- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)] 

To use the =~ operator, everything can change with RWH. You must use the predefined types MatchOffset and MatchLength and a special constructor of type AllMatches :

 import Text.Regex.Posix re = "[^aeiou]" text = "the quick brown fox" test1 = text =~ re :: Bool -- True test2 = text =~ re :: String -- "t" test3 = text =~ re :: (MatchOffset,MatchLength) -- (0,1) test4 = text =~ re :: AllMatches [] (MatchOffset, MatchLength) -- (not showable) test4' = getAllMatches $ (text =~ re :: AllMatches [] (MatchOffset, MatchLength)) -- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)] 

For more information on which contexts are available, see the Text.Regex.Base.Context docs .

UPDATE: I believe that a constructor of type AllMatches was introduced to resolve the ambiguity introduced when the regular expression has subexpressions - for example:

 foo = "axx ayy" =~ "a(.)([^a])" test1 = getAllMatches $ (foo :: AllMatches [] (MatchOffset, MatchLength)) -- [(0,3),(3,3)] -- returns the locations of "axx" and "ayy" but no subexpression info test2 = foo :: MatchArray -- array (0,2) [(0,(0,3)),(1,(1,1)),(2,(2,1))] -- returns only the match with "axx" 

Both are essentially a list of offset length pairs, but they mean different things.

+4
source share

All Articles