A common practice is not to index the so-called stop words when analyzing documents for a search engine. Stop words are common words like a , the and this that often appear in the language. The idea is that if you index stop words, they take up too much space in the index and add little to the quality of the search results.
I would like to know if this is always the case.
In modern search engines, indexing stop words makes the size of the index explode? Or is it just a slight increase.
Also, how does deleting stop words affect phrase searches? The search for the Beatles and the Beatles seems to be two very different.
I am building an application with elasticsearch, but this question is equally applicable to Solr, direct lucene or any other option.
source share