Physical distance between two places

I need to measure the physical distance between two places whose names are listed as strings. Since sometimes the names are spelled slightly differently, I was looking for a library that could help me measure the difference, and then combine it with a measure of latitude and longitude to select the correct matches. Preferred languages: Java or PHP.

Any suggestions?

+9
java string php distance
May 25 '09 at 20:43
source share
6 answers

Look at the Levenshtein distance . This is a way to measure how different two lines are from each other.

I hope I understood your question correctly; using "distance" in the same sentence as "latitude and longitude" can be confusing!

+6
May 25, '09 at 20:45
source share

Although it is written in c (with python and tcl bindings), libdistance will be a tool for applying multiple row spacings / data.

Indicators:

  • Bloom
  • damerau
  • Euclid
  • grimaces
  • Jaccard
  • Levenshtein
  • Manhattan
  • Minkowski
  • needleman_wunsch
+4
May 25 '09 at 20:59
source share

You can get decent results using the phonetic algorithm to find a few erroneous names.

In addition, if you use a more mechanical editing distance, you are likely to see better results using a weighted function that takes into account keyboard geometry (that is, physically closed keys are β€œcheaper” to replace than remote ones). This is a patented btw method, so be careful not to write something that is becoming too popular;)

+1
May 25 '09 at 10:22 p.m.
source share

I found SumMetrics in Java but did not use it.

0
May 25 '09 at
source share

I took the liberty of translating part of the C # code that I wrote to calculate the Levenshtein distance to Java code. It uses only two one-dimensional arrays that alternate instead of a large, uneven array:

public static int getDifference(String a, String b) { // Minimize the amount of storage needed: if (a.length() > b.length()) { // Swap: String x = a; a = b; b = x; } // Store only two rows of the matrix, instead of a big one int[] mat1 = new int[a.length() + 1]; int[] mat2 = new int[a.length() + 1]; int i; int j; for (i = 1; i <= a.length(); i++) mat1[i] = i; mat2[0] = 1; for (j = 1; j <= b.length(); j++) { for (i = 1; i <= a.length(); i++) { int c = (a.charAt(i - 1) == b.charAt(j - 1) ? 0 : 1); mat2[i] = Math.min(mat1[i - 1] + c, Math.min(mat1[i] + 1, mat2[i - 1] + 1)); } // Swap: int[] x = mat1; mat1 = mat2; mat2 = x; mat2[0] = mat1[0] + 1; } // It row #1 because we swap rows at the end of each outer loop, // as we are to return the last number on the lowest row return mat1[a.length()]; } 

This is not strictly verified, but seems to work fine. It was based on the Python implementation that I did for university studies. Hope this helps!

0
May 25, '09 at 21:50
source share

I would recommend Levenshtein Distance or Jaccard Distance for comparing text.

0
May 26, '09 at 13:34
source share



All Articles