We have a web application that allows users to upload documents, create their own documents, etc. The downloaded files are stored on Amazon S3, the created information is stored in the MySQL database. What I'm looking for is a kind of search engine where I feed all our text documents, each with a unique identifier, and it builds an index or something else. Later I can give him search queries, and he will pull out the best matching documents (through their identifier) ββalong with fragments of the corresponding text.
Basically, we want our users to be able to search through their repository of downloaded materials, as well as everything that other users have flagged as public. The solution should work on a standard Linux server, and ideally it would be open source, but I will also consider paid solutions if they are not outrageously priced.
So far I have found three potential candidates:
- MySQL full-text search - some posts I read are very slow
- Apache Lucene is, unfortunately, written in Java, but I will use it if necessary. Supposedly fast
- Sphinx - it seems not so popular, ideally, any solution that I find will have great community support.
Please let me know if there are any other good options that I have missed, or if you have experience with any of the above.
linux web-applications search full-text-search
davr Sep 22 '08 at 22:22 2008-09-22 22:22
source share