How fast is Whoosh?

Whoosh is a fast, functional full-text indexing and search library implemented in pure Python (the official site ).

But I can not find a comparison of speed and performance with other search engines, especially with Lucene (pyLucene, Lupyne ...)?

I use pyLucene, which is known to be fast, but rather non-neat and not easy to use (direct java-Lucene shell). There is a python wrapper pyLucene; Lupyne. However, this is not convenient when the basic Lucene functions are needed.

Any performance tips between Whoosh and others would be appreciated.

+5
source share
1 answer

{1} Whoosh vs Xappy / Xapian

There are tests for testing Python searches supported by Whoosh and Xappy / Xapian here .

Whoosh The authors used these tests to test whoosh against Xappy / Xapian (ref) :

How the benchmark works

N documents were created, the search word is a random word and 10 characters long plus 10 additional fields with 100 characters of random material each (only to pump up the size of the document).

For indexing, all fields are indexed and saved.

To search, all words are executed in random order, and all stored fields are retrieved.

For whoosh, we used a multiprocessor writer to build the index - this explains why it is indexed faster than xappy (because it used all 4 cores, not just 1).

For searching, xappy / xapian is faster (no parallel processing was used). But you see that the speed difference between xappy and whoosh may not be as large as you expected.

Index size about 12 MB

# Phenom II X4 840, 8GB RAM, HDD # Python 2.7.2+ (default, Oct 4 2011, 20:06:09) # [GCC 4.6.1] on linux2 Params: DOC_COUNT: 3000 WORD_LEN: 10 EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100 Benchmarking: xappy 0.5 / xapian 1.2.5 Indexing takes 2.8s (1068.9/s) Searching takes 0.5s (6635.8/s) Benchmarking: whoosh 2.3.2 Indexing takes 0.8s (3575.6/s) Searching takes 0.8s (3714.8/s) 
+5
source

Source: https://habr.com/ru/post/1215541/


All Articles