What is the best Java text indexing library for the Google App Engine?

So far I know that the compass can do the job. But compass indexing is quite expensive. Are there any easier alternatives?

+7
java google-app-engine full-text-indexing
source share
5 answers

Apache Lucene is the de facto choice for full text indexing in Java. Compass Core seems to contain โ€œImplementing the Lucene Directory to store an index in a database (using Jdbc). It is separate from the Compass code base and can be used with pure Lucene applications.โ€ plus tons of other things. You can try to separate only the Lucence component, thereby removing several libraries and making them easier. Either that, or the Compass ditch in general, and use Lucene's clean, unvarnished.

+4
source share

Honestly, I donโ€™t know if Lucene will be lighter than Compass in terms of indexing (why not, does Compass Lucene use it for this?).

In any case, because you asked for alternatives, there is GAELucene . I quote his announcement below:

Enlightened by the discussion, โ€œ Can I run Lucene on the Google engine? โ€, I have implemented the google-based Lucene data store, GAELucene, which can help you run Google search engine applications.

The main GAELucene class includes:

  • GAEDirectory is a read-only Directory based on Google datastore.
  • GAEFile - means the index file, the contents of the byte file will be divided into several GAEFileContent.
  • GAEFileContent - indicates a segment of the index file.
  • GAECategory - identifier of various indexes.
  • GAEIndexInput - resident index IndexInput? implementations like RAMInputStream.
  • GAEIndexReader - wrapper for IndexReader? which are cached in GAEIndexReaderPool
  • GAEIndexReaderPool - pool for GAEIndexReader

The following code snippet demonstrates the use of GAELucene do search:

Query queryObject = parserQuery(request); GAEIndexReaderPool readerPool = GAEIndexReaderPool.getInstance(); GAEIndexReader indexReader = readerPool.borrowReader(INDEX_CATEGORY_DEMO); IndexSearcher searcher = newIndexSearcher(indexReader); Hits hits = searcher.search(queryObject); readerPool.returnReader(indexReader); 

I warmly recommend reading the entire discussion on intrusive, very informative.

Just in case, regarding Compass, Shay Banon wrote a blog post on how to use Compass in the App Engine here: http://www.kimchy.org/searchable-google-appengine-with-compass/ p>

+6
source share

For the Google App Engine, the only indexing library I've seen is appengine-search with a description of how to use this page . I have not tried, though.

I used Lucene (which Compass is based on) and found that it works great at a relatively low cost. Indexing is a task that you can schedule at times, which works for your application.

Some alternative indexing projects are mentioned in this SO thread , including Xapian and minion . I did not check any of them, because since Lucene did everything I needed very well.

+1
source share

Internal search in the Google App looks better and even supports synonyms:

https://developers.google.com/appengine/docs/java/search/

0
source share

If you want to run Lucene on GAE, you can also watch LuGAEne . This is the Lucene Directory implementation for GAE.

The use is actually quite simple, just replace one of Lucene's standard directories with GaeDirectory

 Directory directory = new GaeDirectory("MyIndex"); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_43, analyzer); IndexWriter writer = new IndexWriter(directory, config); ... 

gaelucene seems to be in "maintenance mode" (no commit since September 2009) and lucene-appengine does not work (yet) when you use Objectify version 4 in your application.

Disclaimer: I am the author of LuGAEne.

0
source share

All Articles