Zend Lucene - Symbolizing Swedish Symbols

I use Zend Lucene to index Swedish texts. The problem is that lucene symbolizes the words according to the Swedish Γ₯Àâ characters. For example, the word "vΓ€rldens" becomes the two words "v" and "ldens" in the index.

Is there a way to add characters that zend lucene should accept, not tokenize?

+4
source share
2 answers

use a text analyzer compatible with UTF-8 instead of the default text analyzer for tokenization. note that this requires PHP PCRE (Perl-compatible regular expressions), which must be compiled with UTF-8 support (by default, if you use the PCRE Library, which is PHP-related, but maybe not included if you use shared library). for case-insensitive versions of analyzers compatible with UTF-8, you also need to enable the mbstring extension.

+5
source

Use of analyzers. See Docs for text analysis using utf8 and docs for writing your own parser . I recommend that you simply use the UTF-8 analyzer.

+2
source

All Articles