I think you can get reasonable performance by carefully crafting the parts. The problem part is memory. With enough memory, you could avoid recording and merging.
When processing a single document, you can convert it to BitSet when each bit is set, if the corresponding word is present.
Your attitude is symmetrical, so I hope you save (a, b, count) with a < b .
Counting requires something like Multiset<Pair<String, String>> , but there are more memory storage structures. Your words are numbered, so each can be represented int , and a pair can be represented using long . Maybe something like LongIntHashMap . You need concurrency so that you can use the aromatics for records or split the map into parts of N (after some hashing with N more than the number of cores) and synchronize. Simply create something on top of AtomicIntegerArray .
You did not say whether there is a chance that your result will fit into memory, but if so, this can lead to tremendous acceleration.
Requested explanation
Lines are numbered from 0 to one million, which corresponds to an int value. Two of these numbers fit together in long , which can be used as a key for TLongIntHashMap . For each document, you define all the corresponding String lines, get the corresponding long and increase the value in TLongIntHashMap .
Here only an increment is required when locking. Since this lock prevents concurrency, I suggested using multiple cards, each with its own lock. The increment can be grouped so that multiple operations can be performed with a single lock.
A better solution would be to use one TIntIntHashMap for each word. Imagine that you put all the words (represented as int s) found in the document in Set. Then you can make a loop as follows
for (int w1 : words) { getLock(w1).lock(); TIntIntHashMap map = getMap(w1); for (int w2 : words) { if (isLess(w1, w2) map.increment(w2); } getLock(w1).unlock(); }
Here isLess is an arbitrary antisymmetric non-reflexive relation used to avoid preserving both (a, b) and (b, a) . While just w1 < w2 , this will result in rather unbalanced values ( getMap(0) will probably be large and getMap(1000000) will be empty). Using ((w1 - w2) ^ ((w1 + w2) << 31)) < 0 should be done.