Elasticsearch: get phrase frequency in this document

Test data:

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '{ "body": "this is a test" }' curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "and this is another test" }' curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "this thing is a test" }' 

My goal is to get the frequency of a phrase in a document.

I know how to get the frequency of terms in a document:

 curl -g "http://localhost:9200/customer/external/1/_termvectors?pretty" -d' { "fields": ["body"], "term_statistics" : true }' 

And I know how to count documents containing a given phrase (with match_phrase or span_near request):

 curl -g "http://localhost:9200/customer/_count?pretty" -d' { "query": { "match_phrase": { "body" : "this is" } } }' 

How can I access the frequency of a phrase?

+8
elasticsearch elasticsearch-5
source share
1 answer

You can use the terms. As written in the documentation

Return values

Three types of values ​​can be requested: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields, but there are no term statistics. Editing Time Information

 term frequency in the field (always returned) term positions (positions : true) start and end offsets (offsets : true) term payloads (payloads : true), as base64 encoded bytes 

you need to reach the time frequency - in the example you can see that the dock has a frequency for john doe. Note that termvector duplicates disk space usage for the field on which it is applied.

+1
source share

All Articles