I am parsing a dataset onto the ELK stack for some non-technical people to view. As part of this, I want to remove all fields except a certain well-known subset of fields from events before sending them to ElasticSearch.
I can explicitly specify each field to delete in the filter mutate as follows:
filter { mutate { remove_field => [ "throw_away_field1", "throw_away_field2" ] } }
In this case, when a new field is added to the input data (which can happen often because the data is pulled out of the queue and used by several systems for several purposes), updating will require updating additional overhead that is not needed. Not to mention that some sensitive data passed between them when the input streams were updated and when the filtering was updated, this can be bad.
Is there a way to use the logstash filter to iterate over each field of an object and remove_field if it is not listed in the list of field names? Or do I need to write a special filter for this? In principle, for each individual object, I just want to save 8 specific fields and drop absolutely everything else.
It seems that a very minimal logic like if ![field] =~ /^value$/ is available in the logstash.conf file, but I see no examples that would sort through the fields themselves in the style of for each and compare the field name with a list of values.
Answer:
After updating logstash to 1.5.0, in order to be able to use plug-in extensions, such as prunes, the solution was as follows:
filter { prune { interpolate => true whitelist_names => ["fieldtokeep1","fieldtokeep2"] } }
source share