You use the standard text_general field for the title attribute. This may not be a good choice. text_general intended for huge fragments of text (or, at least, sentences), and not for exact matching of names or titles.
The problem here is that text_general uses a StandardTokenizerFactory .
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
StandardTokenizerFactory performs the following actions:
A good universal tokenizer that skips a lot of extraneous characters and sets the types of tokens for significant values. Token types are only useful for subsequent token filters that are familiar with the type of the same token types.
This means that the "-" character will be completely ignored and used to tokenize the string.
"kong-fu" will be presented as "kong" and "fu". "-" disappears.
This also explains why select?q=title:\- doesn't work here.
Choose a more suitable field type:
Instead of StandardTokenizerFactory you can use solr.WhitespaceTokenizerFactory , which is split only into spaces for exact word matching. Thus, creating your own field type for the title attribute will be the solution.
Solr also has a miniature field type called text_ws . Depending on your requirements, this may be sufficient.
jHilscher
source share