It may be a little difficult to implement, but I have no other ideas than this:
To reduce computational complexity, you may have to use some data structure such as KD-Tree or QuadTree.
You can use the KD tree to find the closest neighbor, and that is what you need.
1) You create your kd tree for the first list in O (n log n). This should be done in one thread.
2) For each element in the second list, you search in the kd-tree for the nearest neighbor (the nearest point to the point you are looking for), in O (m log n). If the distance from the current point to the nearest point found is less than your delta, you have one. If you want, you can take this step using multiple threads.
Thus, at the end, the complexity of the algorithm will be O (max (n, m) * log n), where n is the number of elements in the first list, m is the number of elements in the second list.
For KD trees see:
See http://home.wlu.edu/~levys/software/kd/ . This seems like a good implementation in java and C #.
See http://www.codeproject.com/KB/architecture/KDTree.aspx
For four trees:
See http://csharpquadtree.codeplex.com/
See http://www.codeproject.com/KB/recipes/QuadTree.aspx
And of course, look at Wikipedia, what is quadtree and kd-tree
Note that (database 2000 * log 2 (2000)) is about 21931.5
Instead of 2000 * 2000 - 4,000,000, a big difference!
Using a parallel algorithm, if you have 4 processors, a normal O (n * n) algorithm will require 1,000,000 per processor, and I think it will be too big if you need something fast or near real time.