It all depends on how you are going to access it. And, of course, how many of them are going to access it. Read on MapReduce .
If you are going to collapse your own, you will need to create an index file, which is a map view between unique words and a tuple (file, line, offset). Of course, you can think of other data structures in memory, such as trie (prefix-tree) a Judy array and the like ...
Some third-party solutions are listed here .
source share