I am trying to find a good library for building a search engine with a linguistic body. Such an engine should provide an absolutely transparent search result (the exact number of matches found, without a result, even if the whole case matches), the basic query syntax (AND, OR, NOT operator, remote search, wildcard search) and the ability to refine documents set for search (i.e., trim setting). An important detail is the ability to separate indexes and perform searches in parallel (the case size is about 10 ^ 8 words, and the search service should be in real time).
The main choice between Sphinx and Clucene (C ++ Lucene port). Unfortunately, I know little about this organization of libraries, so it would be very useful to know which one is best suited to my requirements.
(I also tried a specialized engine - IMS Corpus Workbench), which turned out to be not as scalable as necessary).
source share