use a text analyzer compatible with UTF-8 instead of the default text analyzer for tokenization. note that this requires PHP PCRE (Perl-compatible regular expressions), which must be compiled with UTF-8 support (by default, if you use the PCRE Library, which is PHP-related, but maybe not included if you use shared library). for case-insensitive versions of analyzers compatible with UTF-8, you also need to enable the mbstring extension.
source share