Elasticsearch query with nested aggregates causing out of memory

Question

Elasticsearch query with nested aggregates causing out of memory

I have Elasticsearch installed with 16GB of memory. I started using aggregation, but ran into the error "java.lang.OutOfMemoryError: Java heap space" when I tried to execute the following query:

POST /test-index-syslog3/type-syslog/_search { "query": { "query_string": { "default_field": "DstCountry", "query": "CN" } }, "aggs": { "whatever": { "terms": { "field" : "SrcIP" }, "aggs": { "destination_ip": { "terms": { "field" : "DstIP" }, "aggs": { "port" : { "terms": { "field" : "DstPort" } } } } } } } }

The query_string line itself only returns 1266 hits, so I'm a bit confused by the OOM error.

Am I using aggregations incorrectly? If not, what can I do to fix this problem? Thanks!

+6

elasticsearch

Sgt b Mar 07 '14 at 16:46

source share

2 answers

Not sure about the mapping, but looking at the value, the DstCountry field may not be analytic. How can you replace the request with a filter inside the unit. Maybe this helps.

Also check to see if the fields you use in your aggregation are of type non_analyzed.

+2

Jettro coenradie Mar 07 '14 at 17:14

source share

Alex Brasetvik · Accepted Answer · 2014-03-07T18:02:31+0000

You load all the SrcIP -, DstIP -, and DstPort into memory for aggregation. This is because Elasticsearch un-inverts the entire field in order to be able to quickly search for the document value for the field with its identifier.

If you are going to mainly collect data on a very small data set, you should consider using docvalues. Then the value of the document is stored in such a way that it can be easily searched, given the identifier of the document. There's a bit more overhead, but this way you leave it in the operating system cache to have the corresponding pages in memory, instead of loading the entire field.

Elasticsearch query with nested aggregates causing out of memory

More articles: