I have a bunch of company data in an ES database. I am looking for data on the number of documents submitted by each company, but I am having problems with the aggregation request. I want to exclude terms such as "Corporation" or "Inc." So far, I have been able to do this successfully for one semester at a time according to the code below.
{ "aggs" : { "companies" : { "terms" : { "field" : "Companies.name", "exclude" : "corporation" } } } }
Which returns
"aggregations": { "assignee": { "buckets": [ { "key": "inc", "doc_count": 375 }, { "key": "company", "doc_count": 252 } ] } }
Ideally, I would like to be able to do something like
{ "aggs" : { "companies" : { "terms" : { "field" : "Companies.name", "exclude" : ["corporation", "inc.", "inc", "co", "company", "the", "industries", "incorporated", "international"], } } } }
But I could not find a way that does not throw an error
I reviewed the "Terms" section of Aggregation in the ES documentation and can only find an example for one exception. I am wondering if it is possible to exclude a few terms, and if so, what is the correct syntax for this.
Note: I know that I could set the "not_analyzed" field and get groupings for full company names, and not for split names. However, I hesitate to do this, as the analysis allows the basket to be more tolerant of name changes (for example, Microsoft Corp & Microsoft Corporation)