How to find "FooBar" while passing "Foo Bar" in Zend Lucene

Question

How to find "FooBar" while passing "Foo Bar" in Zend Lucene

I am creating a search function for a php website using Zend Lucene and I have a problem. My website is a store director (something like that).

For example, I have a store called "FooBar", but my visitors watch "Foo Bar" and get zero results. Also, if the store is called "Foo Bar" and the visitor visits "FooBar", nothing was found.

I tried to find "foobar ~" (fuzzy acquaintance), but did not find an article called "Foo Bar"

Is there any specific way to create an index or query?

+4

php zend-framework lucene zend-search-lucene fuzzy-search

Daniel Apr 29 '09 at 7:09

source share

4 answers

Manually add index entries for most common name misunderstandings. Ask your customers to enter them in a special form.

+1

Aaron watters May 07, '09 at 14:25

source share

Have you tried "* foo * AND * bar *" or "* foo * OR * bar *"? He works at Ferret, and I read that he is based on Lucene.

0

klew Apr 29 '09 at 7:23

source share

If you don't care about performance, use WildcardQuery (performance is much worse):

new WildcardQuery( new Term( "propertyName", "Foo?Bar" ) );

For zero or more characters use '*', for zero or one character use '?'

If performance is important, try using BooleanQuery.

0

Cambium May 01, '09 at 0:40

source share

Shashikant Kore · Accepted Answer · 2009-04-29T10:02:52+0000

Option 1: Break the input request line into two parts at different points and perform a search. eg. In this case, the request will be (+ fo + bar) OR (+ foo + bar) OR (+ foob + ar). The problem is that this tokenization assumes the presence of two tokens in the input line of the request. In addition, you may get additional, possibly irrelevant results, such as results (+ foob + ar)

Option 2: use n-gram tokenization when indexing and querying. Although token indexing for "foo bar" would be fo, oo, ba, ar. When searching with foobar, tokens will be fo, oo, ob, ba, ar. When searching with the OR operator, you will get documents with maximum n-gram matches at the top. This can be achieved using NGramTokenizer.

How to find "FooBar" while passing "Foo Bar" in Zend Lucene

More articles: