@yura, this is not what you are looking for, but I don’t think any smart algorithm can consistently eliminate whether queries like "soma ca" can refer to Soma in San Fran or Soma Lake in Canada. The problem is not that your algorithm is not complicated enough; the problem is that there is simply not enough information in the query "soma ca".
I do not know how to express this clearly, but there is theoretical information information. This is similar to the fact that random data cannot be compressed without loss: there is not enough information at the input to calculate the desired result.
, , "soma ca" , , Soma SF. , 2- "ca" "" , , "" , . , ad-hoc, ad-hoc log(population), .
"" ( , ):
- . , , , .
- , , .
- , , , , , , . .
- , , . , , .