Solr: cannot find numbers mixed with characters

I have some elements in my index (Solr 4.4) that contain type names Foobar 135g, where 135g refers to some weights. The search foobareither foobar 135really works, but when I try to find the exact phrase Foobar 135g, nothing was found.

I analyzed the request inside the solr "Analysis" control panel. Everything looks good here. Fields are indexed correctly, the request is broken correctly, and I get hits (indicated by this purple background on tokens).

But the problem should be how to process strings by index and / or query time. So this is a field definition, I use:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" catenateAll="1" preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="30"/>
    <filter class="solr.ReverseStringFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="30"/>
    <filter class="solr.ReverseStringFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" catenateWords="1" catenateAll="1" preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

ReverseStringFilterFactory EdgeNGramFilterFactory, foob bar obar (, ). , - WordDelimiterFilterFactory catenateWords. ( ?).

(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) generateNumberParts, - 1. 135g 135 g. , preserveOriginal, 135g . "" :

Analysis Panel solr Admin Interface: WDF (WordDelimiterFilterFactory)

- , , ... ?

UPDATE

- . 135g, :

<lst name="debug">
  <str name="rawquerystring">name_texts:135g</str>
  <str name="querystring">name_texts:135g</str>
  <str name="parsedquery">MultiPhraseQuery(name_texts:"(135g 135) (g 135g)")</str>
  <str name="parsedquery_toString">name_texts:"(135g 135) (g 135g)"</str>
  <lst name="explain"/>
  <str name="QParser">LuceneQParser</str>
  ...
</lst>

, - solr.WordDelimiterFilterFactory . Solr MultiPhraseQuery? , , , solr.WordDelimiterFilterFactory , (, , OR ).

, - , ;) ?

+4
1

WordDelimiterFilterFactory. . : splitOnNumerics = "0" .

Update:

: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.

solr.WordDelimiterFilterFactory

solr.analysis.WordDelimiterFilter.

. :

splitOnNumerics = "1" = > [Solr 1.3]: "j2se" = > "j" "2" "se" default - true ( "1" ); 0

2

, , . solr4.5.1 test_mytext: "foobar 135g", test_mytext: foobar 135g, test_mytext: foobar 135g, test_mytext: foobar, test_mytext: 135g, test_mytext: 135. test_mytext , . , . , : <field name="text" type="mytext" indexed="true" stored="true"/>

Upadate 3 , , , : Query = > test_mytext: 135g   debug ": {       " rawquerystring ":" test_mytext: 135g ",       " querystring ":" test_mytext: 135g ",       " parsedquery ":" test_mytext: 135g test_mytext: 135 test_mytext: g test_mytext: 135g ",       " parsedquery_toString ":" test_mytext: 135g test_mytext: 135 test_mytext: g test_mytext: 135g ",       " ": {         " 200 ":" \n0.8563627 = (MATCH) :\n 1.141817 = (MATCH) :\n 0.35407978 = (MATCH) weight (test_mytext: 135g in 1) [DefaultSimilarity], :\n 0.35407978 = (doc = 1, freq = 2.0 = termFreq = 2.0\n), :\n 0.45980635 = queryWeight, :\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.13194223 = queryNorm\n 0.77006286 = fieldWeight in 1, :\n 1.4142135 = tf (freq = 2.0), :\n 2.0 = termFreq = 2.0\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.15625 = fieldNorm (doc = 1)\n 0.4336574 = (MATCH) weight (test_mytext: 135 in 1) [DefaultSimilarity], :\n 0.4336574 = (doc = 1, freq = 3.0 = termFreq = 3.0\n), :\n 0.45980635 = queryWeight, :\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.13194223 = queryNorm\n 0.94313055 = fieldWeight in 1, :\n 1.7320508 = tf (freq = 3.0), :\n 3.0 = term Freq = 3.0\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.15625 = fieldNorm (doc = 1)\n 0.35407978 = (MATCH) weight (test_mytext: 135g in 1) [DefaultSimilarity], :\n 0,35407978 = (doc = 1, freq = 2.0 = termFreq = 2.0\n), :\n 0.45980635 = queryWeight, :\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.13194223 = queryNorm\n 0.77006286 = fieldWeight in 1, :\n 1.4142135 = tf (freq = 2.0), :\n 2.0 = termFreq = 2.0\n 3.4849067 = idf (docFreq = 2, maxDocs = 36)\n 0.15625 = fieldNorm (doc = 1)\n 0,75 = (3/4)\n "       },

solr 4.5.1.

4 , Solr 4.4.0. , .

Query = > name_texts: "135g"

:

<result name="response" numFound="1" start="0">
  <doc>
    <str name="id">100</str>
    <str name="name_texts">Foobar 135g</str>
    <long name="_version_">1456487722571005952</long></doc>
</result>
<lst name="debug">
  <str name="rawquerystring">name_texts:"135g"</str>
  <str name="querystring">name_texts:"135g"</str>
  <str name="parsedquery">MultiPhraseQuery(name_texts:"(135g 135) (g 135g)")</str>
  <str name="parsedquery_toString">name_texts:"(135g 135) (g 135g)"</str>

. ,    , , . - , . , . , solr schema.xml () = > {"id ":" 100", "name_texts ":" Foobar 135g "}. http://localhost:8983/solr/collection1/select?q=name_texts%3A%22135g%22&wt=xml&indent=true&debugQuery=true

+6

All Articles