What is the best way to determine location for location data?

What is the best method for disambiguating location for location data?

There are several counting algorithms for searching geonames, but they do not open it, and I'm not sure if they are very complex. (i.e. for soma, cait returns Soma lake in Canadathat even wikipedia articles do not have, and are not very popular Soma Neirbohood in san francisco)

There are also some works that I found with a google scientist, but they seem very shallow and look like my heuristics, for example, to score ( log(population) + 1000*hasWikipedia(article)+ isCity100+isCapital(10)).

My domain is in travel articles, so my scoring function should provide the most likely tourist places (cities, places of interest (Disneyland, Coleium, Big Ben)).

Do you know any important article in this area or the algorithms used to create Google maps, yahoo, bing or even geonames?

+5
source share
1 answer

@yura, this is not what you are looking for, but I don’t think any smart algorithm can consistently eliminate whether queries like "soma ca" can refer to Soma in San Fran or Soma Lake in Canada. The problem is not that your algorithm is not complicated enough; the problem is that there is simply not enough information in the query "soma ca".

I do not know how to express this clearly, but there is theoretical information information. This is similar to the fact that random data cannot be compressed without loss: there is not enough information at the input to calculate the desired result.

, , "soma ca" , , Soma SF. , 2- "ca" "" , , "" , . , ad-hoc, ad-hoc log(population), .

"" ( , ):

  • . , , , .
  • , , .
  • , , , , , , . .
  • , , . , , .
+3

All Articles