The two nested loops you wrote are a great starting point. That is, we can write a tokens function that delegates its work to two recursive functions outer and inner that correspond to your loops:
type Token a = ([a], Int, Int) tokens :: [a] -> [Token a] tokens = outer 0 where outer _ [] = [] outer i l@ (_ : xs) = inner i [] l ++ outer (i + 1) xs where inner _ _ [] = [] inner j acc (x : xs) = (acc ++ [x], i, j) : inner (j + 1) (acc ++ [x]) xs
Here outer iterates over the line and for each starting position inside this line calls inner to collect all the segments that begin at that position along with their ending positions.
Although this feature meets your requirements,
> tokens "blah" [("b",0,0),("bl",0,1),("bla",0,2),("blah",0,3),("l",1,1),("la",1,2),("lah",1,3),("a",2,2),("ah",2,3),("h",3,3)]
it is rather inefficient due to repeated concatenation of lists. A more efficient version will accumulate its results in the so-called difference lists :
type Token a = ([a], Int, Int) tokens :: [a] -> [Token a] tokens l = outer 0 l [] where outer _ [] = id outer i l@ (_ : xs) = inner i id l . outer (i + 1) xs where inner _ _ [] = id inner j acc (x : xs) = ((acc [x], i, j) :) . inner (j + 1) (acc . (x :)) xs
How to build a dictionary, of course, depends on how you imagine it. Here's an approach that uses simple ordered lists of associations,
type Dict a = [([a], [(Int, Int)])] empty :: Dict a empty = [] update :: Ord a => Token a -> Dict a -> Dict a update (xs, i, j) [] = [(xs, [(i, j)])] update (xs, i, j) ((ys, ns) : dict) = case compare xs ys of LT -> (xs, [(i, j)]) : (ys, ns) : dict EQ -> (ys, (i, j) : ns) : dict GT -> (ys, ns) : update (xs, i, j) dict toDict :: Ord a => [a] -> Dict a toDict = foldr update empty . tokens
but since your keys are strings, trying (aka prefix trees) is probably the best choice.
If this is an effective subtitle that you want about, I would recommend looking for suffix trees, although their implementation is somewhat involved. you can check
- Robert Gigerich and Stefan Kurtz. Comparison of imperative and purely functional suffix constructions. Computer Science Science 25 (2-3): 187-218, 1995
and Bryan O'Sullivan suffixtree package for Hackage.