ElasticSearch - Hyphenated Search

Elastic Search 1.6

I want to index text containing hyphens, such as U-12, U-17, WU-12, T-shirt ... and be able to use the "Simple Query String" query to search for them.

Sample data (simplified):

{"title":"U-12 Soccer",
 "comment": "the t-shirts are dirty"}

Since there are already quite a lot of questions about hyphens, I have already tried the following solution:

Use the Char filter: ElasticSearch - search with hyphens by name .

So, I went for this comparison:

{
  "settings":{
    "analysis":{
      "char_filter":{
        "myHyphenRemoval":{
          "type":"mapping",
          "mappings":[
            "-=>"
          ]
        }
      },
      "analyzer":{
        "default":{
          "type":"custom",
          "char_filter":  [ "myHyphenRemoval" ],
          "tokenizer":"standard",
          "filter":[
            "standard",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings":{
    "test":{
      "properties":{
        "title":{
          "type":"string"
        },
        "comment":{
          "type":"string"
        }
      }
    }
  }
}

The search is performed with the following query:

{"_source":true,
  "query":{
    "simple_query_string":{
      "query":"<Text>",
      "default_operator":"AND"
    }
  }
}
  • What works:

    "U-12", "U *", "t *", "ts *"

  • What does not work:

    "U- *", "u-1 *", "t- *", "t-sh *", ...

So it seems that the Char filter is not running in the search strings? What can I do to make this work?

+6
3

:

:

simple_query_string . , i-ma. i-mac , mac, i-ma. i-mac, :

{
  "_source":true,
  "query":{
    "simple_query_string":{
      "query":"u-1*",
      "analyze_wildcard":true,
      "default_operator":"AND"
    }
  }
}
+6

, "analy_wildcard": true, . , "u-12" "u" "12", .

, Mapping char filter. .

, "m0-77", "m1-77" "m2-77", m * -77, -. "-" () AND, , m * AND 77, .

.

u- *

{
  "query":{
    "simple_query_string":{
      "query":"u AND 1*",
      "analyze_wildcard":true
    }
  }
}

- *

  {
      "query":{
        "simple_query_string":{
          "query":"t AND sh*",
          "analyze_wildcard":true
        }
      }
    }
+1

If someone is still looking for a simple solution to this problem, replace the hyphen with an underscore _when indexing data.

For example, O-000022334 should be indexed as O_000022334.

When searching, replace the underscore back with a hyphen again when displaying the results. So you can find "O-000022334" and it will find the correct match.

0
source

All Articles