Zend Lucene - Symbolizing Swedish Symbols

Question

Zend Lucene - Symbolizing Swedish Symbols

I use Zend Lucene to index Swedish texts. The problem is that lucene symbolizes the words according to the Swedish åäö characters. For example, the word "världens" becomes the two words "v" and "ldens" in the index.

Is there a way to add characters that zend lucene should accept, not tokenize?

+4

zend-framework lucene zend-search-lucene zend-lucene

Martin Dec 30 '09 at 14:11

source share

2 answers

Use of analyzers. See Docs for text analysis using utf8 and docs for writing your own parser . I recommend that you simply use the UTF-8 analyzer.

+2

Yacoby Dec 30 '09 at 14:35

source share

ax. · Accepted Answer · 2009-12-30T14:36:27+0000

use a text analyzer compatible with UTF-8 instead of the default text analyzer for tokenization. note that this requires PHP PCRE (Perl-compatible regular expressions), which must be compiled with UTF-8 support (by default, if you use the PCRE Library, which is PHP-related, but maybe not included if you use shared library). for case-insensitive versions of analyzers compatible with UTF-8, you also need to enable the mbstring extension.

Zend Lucene - Symbolizing Swedish Symbols

More articles: