I recently added search capabilities to my django-based website to allow employers to search for employees using keywords. When a user initially loads his resume, I turn it into text, get rid of stop words, and then add text to the TextField for this user. I used Django-Haystack with the Whoosh search engine.
Three things -
1) Besides the extra features that I probably won't use, is there any specific advantage to switching to Solr or Xapian?
2) When converting a resume to text, I essentially index the PDF myself. I know that both Xapian and Solr support .pdf indexing, however, in his opinion, Haystack does not. Any tips on getting around this? Or should I index it myself? If so, should I do more than just provide a text file with keywords?
3) Whoosh only returns the result if the keyword matches exactly. If a user has βmathβ as his keyword and I search for βmathβ, I want this user to appear. I could not definitively say whether Xapian or Solr supports this. Thoughts?
Thanks for any suggestion. I'm going to keep delving into it myself for now.
django search indexing model django-haystack
dpetters
source share