Django Haystack Substring Search

I recently added search capabilities to my django-based website to allow employers to search for employees using keywords. When a user initially loads his resume, I turn it into text, get rid of stop words, and then add text to the TextField for this user. I used Django-Haystack with the Whoosh search engine.

Three things -

1) Besides the extra features that I probably won't use, is there any specific advantage to switching to Solr or Xapian?

2) When converting a resume to text, I essentially index the PDF myself. I know that both Xapian and Solr support .pdf indexing, however, in his opinion, Haystack does not. Any tips on getting around this? Or should I index it myself? If so, should I do more than just provide a text file with keywords?

3) Whoosh only returns the result if the keyword matches exactly. If a user has β€œmath” as his keyword and I search for β€œmath”, I want this user to appear. I could not definitively say whether Xapian or Solr supports this. Thoughts?

Thanks for any suggestion. I'm going to keep delving into it myself for now.

+7
django search indexing model django-haystack
source share
1 answer

Unfortunately, I do not know enough to answer your other questions, however for point 3.) Whoosh really supports this.

You will need to use the autocomplete function for SearchQuerySet.

More details here: http://docs.haystacksearch.org/dev/autocomplete.html

I am currently using Whoosh and match partial matches.

+6
source share

All Articles