Best text search engine for integrating with a custom web application?

We have a web application that allows users to upload documents, create their own documents, etc. The downloaded files are stored on Amazon S3, the created information is stored in the MySQL database. What I'm looking for is a kind of search engine where I feed all our text documents, each with a unique identifier, and it builds an index or something else. Later I can give him search queries, and he will pull out the best matching documents (through their identifier) ​​along with fragments of the corresponding text.

Basically, we want our users to be able to search through their repository of downloaded materials, as well as everything that other users have flagged as public. The solution should work on a standard Linux server, and ideally it would be open source, but I will also consider paid solutions if they are not outrageously priced.

So far I have found three potential candidates:

  • MySQL full-text search - some posts I read are very slow
  • Apache Lucene is, unfortunately, written in Java, but I will use it if necessary. Supposedly fast
  • Sphinx - it seems not so popular, ideally, any solution that I find will have great community support.

Please let me know if there are any other good options that I have missed, or if you have experience with any of the above.

+3
linux web-applications search full-text-search
Sep 22 '08 at 22:22
source share
6 answers

Take a look at Solr . It is based on Lucene, so it is very fast, and it is very easy to use from any platform.

+4
Sep 29 '08 at 16:12
source share

Sphinx may be worth your attention, as it works well with several common RDMS (in particular, MySQL)

+2
Sep 29 '08 at 16:15
source share

There is also Xapian , which is fast and completely customizable.

It supports custom indexes that allow you to index data that is not stored in the database, which may be useful for your documents stored on S3.

+1
Sep 29 '08 at 15:34
source share

I believe that Google will have a solution that suits your needs. Start here: Google Enterprise

0
Sep 22 '08 at 22:26
source share

There is a Ruby Lucene port called Ferret . "In addition to the Ruby API, you can get a basic c implementation called cFerret.

0
Sep 22 '08 at 22:42
source share

Lutsen is very good. And although it was originally written in java, there is an implementation of php http://framework.zend.com/manual/en/zend.search.lucene.html

0
Sep 22 '08 at 22:46
source share



All Articles