Elasticsearch query with nested aggregates causing out of memory

I have Elasticsearch installed with 16GB of memory. I started using aggregation, but ran into the error "java.lang.OutOfMemoryError: Java heap space" when I tried to execute the following query:

POST /test-index-syslog3/type-syslog/_search { "query": { "query_string": { "default_field": "DstCountry", "query": "CN" } }, "aggs": { "whatever": { "terms": { "field" : "SrcIP" }, "aggs": { "destination_ip": { "terms": { "field" : "DstIP" }, "aggs": { "port" : { "terms": { "field" : "DstPort" } } } } } } } } 

The query_string line itself only returns 1266 hits, so I'm a bit confused by the OOM error.

Am I using aggregations incorrectly? If not, what can I do to fix this problem? Thanks!

+6
source share
2 answers

You load all the SrcIP -, DstIP -, and DstPort into memory for aggregation. This is because Elasticsearch un-inverts the entire field in order to be able to quickly search for the document value for the field with its identifier.

If you are going to mainly collect data on a very small data set, you should consider using docvalues. Then the value of the document is stored in such a way that it can be easily searched, given the identifier of the document. There's a bit more overhead, but this way you leave it in the operating system cache to have the corresponding pages in memory, instead of loading the entire field.

+7
source

Not sure about the mapping, but looking at the value, the DstCountry field may not be analytic. How can you replace the request with a filter inside the unit. Maybe this helps.

Also check to see if the fields you use in your aggregation are of type non_analyzed.

+2
source

All Articles