Given your problem, I see no other way to compare each address with any other address if you want to use Lehvenstein distance .
First of all, you must normalize addressing, get rid of abbreviations, etc.
You may have some fixed maximum Lehvenstein (N) distance for similar addresses.
If so, you could interrupt the Lehvenstein algorithm if you are sure that the editing distance for the current pair of addresses is greater than N. For this, you need to write a custom version of the Lehvenstein algorithm. This will make the algorithm a little faster.
There are also some related trivial optimizations. For example: if address A is 10 characters long and address B is 20 characters, and you count addresses that have Lechvenshtein's distance less than 8, it will be the same. You can look at the lengths of addresses and immediately decide that they are not alike.
Juha syrjälä
source share