I think this solution that might help here is the Beider-Morse Phonetic Matching (BMPM)
Beider-Morse Phonetic Matching (BMPM) is a βsound toolβ that allows you to perform searches using the new phonetic matching system.
So, for example, the words "tilevizor" and "TV" will be similar, and we will get a match. Something that could be tweaked is a phonetic matching algorithm. Solr supports many of them, and I'm not sure which one will be better: DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone (v2.0), ColognePhonetic or Nysiis.
In addition, I would like to update solr.ICUTransformFilterFactory with id="Russian-Latin/BGN" , which convert Russian characters to Latin characters much better.
<fieldType name="spell_ru" class="solr.TextField" positionIncrementGap="100" omitNorms="true"> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ICUTransformFilterFactory" id="Russian-Latin/BGN"/> <filter class="solr.PhoneticFilterFactory" encoder="Caverphone"/> </analyzer> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.ICUTransformFilterFactory" id="Russian-Latin/BGN"/> <filter class="solr.PhoneticFilterFactory" encoder="Caverphone"/> </analyzer> </fieldType>
The type of field above does the trick in many cases, for example
q=title:tilevizor SolrDocument{title= samsung, _version_=1583123812650582016} SolrDocument{title=televizor , _version_=1583123812667359232} q=title: SolrDocument{title= samsung, _version_=1583123812650582016} SolrDocument{title=televizor , _version_=1583123812667359232} q=title:smasung SolrDocument{title= samsung, _version_=1583123812650582016} SolrDocument{title=televizor , _version_=1583123812667359232} SolrDocument{title= samsung, _version_=1583123812684136448} SolrDocument{title=galaxy , _version_=1583123812684136449}
I created the following test class here , feel free to play with this.
Mysterion
source share