Use jq to count on several levels

Question

Use jq to count on several levels

We found some domain names associated with infections. Now we have a list of DNS names in the .json file, and I would like to get a generalized result showing: a list of users, the unique domains they visited, the total. Bonus points if I can get a domain name bill.

Here is an example file:

{"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071870} {"machine": "possible_victim01", "domain": "evil.com", "timestamp":1435071875} {"machine": "possible_victim01", "domain": "soevil.com", "timestamp":1435071877} {"machine": "possible_victim02", "domain": "bad.com", "timestamp":1435071877} {"machine": "possible_victim03", "domain": "soevil.com", "timestamp":1435071879}

Ideally, I would like the result to be something like:

 {"possible_victim01": "total": 3, {"evil.com": 2, "soevil.com": 1}} {"possible_victim02": "total": 1, {"bad.com": 1}} {"possible_victim03": "total": 1, {"soevil.com": 1}}

I would gladly agree to:

 {"possible_victim01": "total": 3, ["evil.com", "soevil.com"]} {"possible_victim02": "total": 1, ["bad.com"]} {"possible_victim03": "total": 1, ["soevil.com"]}

I can get the total number of entries for each user, but I am losing the list of domains:

 cat sample.json | jq -s 'group_by(.machine) | map({machine:.[0].machine,domain:.[0].domain, count:length}) ' [{"machine": "possible_victim01", "domain": "evil.com", "count": 3}, {"machine": "possible_victim02", "domain": "bad.com", "count": 1}, {"machine": "possible_victim03", "domain": "soevil.com", "count": 1}]

This post describes how to solve the second half of the problem ... JQ Aggregations and Crosstabs . I have not found anything that describes the first half, get to:

 {"machine": "possible_victim01", "domain": "evil.com", "count":2} {"machine": "possible_victim01", "domain": "soevil.com", "count":1} {"machine": "possible_victim02", "domain": "bad.com", "count":1} {"machine": "possible_victim03", "domain": "soevil.com", "count":1}

+5

jq

Justchill Jun 24 '15 at 19:49

source share

3 answers

Using group_by in the described order is good, but if you have a very large number of lines (i.e. JSON objects) to read, as suggested by the provided sample, then you may run into performance problems and / or bandwidth limitations.

These problems can be solved very effectively in any version of jq with built-in "inputs" (for example, jq 1.5rc1).

Note that using the "inputs" you call jq with the -n option, for example:

 jq -n -f program.jq data.json

Note also that it is preferable to output JSON output here, and it seems to look like what you need:

 {"possible_victim01": { "total": 3, "evildoers": {"evil.com": 2, "soevil.com": 1} }, "possible_victim02": ...}`

The following program can be made more concise, but the presentation here is intended to make the process transparent, assuming a basic understanding of jq. If there is magic here, this is something that the special case of "null" does not need to be done.

 reduce inputs as $line ({}; . as $in | ($line.machine) as $machine | ($line.domain) as $domain | ($in[$machine].evildoers ) as $evildoers | . + { ($machine): {"total": (1 + $in[$machine]["total"]), "evildoers": ($evildoers | (.[$domain] += 1)) }} )

Using the provided sample, the output signal:

 { "possible_victim01": { "total": 3, "evildoers": { "evil.com": 2, "soevil.com": 1 } }, "possible_victim02": { "total": 1, "evildoers": { "bad.com": 1 } }, "possible_victim03": { "total": 1, "evildoers": { "soevil.com": 1 } } }

+2

peak Jun 28 '15 at 6:15

source share

Here is a solution using reduce , getpath and setpath

 reduce .[] as $o ( {} ; [$o.machine, "total"] as $p1 | [$o.machine, "domains", $o.domain] as $p2 | setpath($p1; 1+getpath($p1)) | setpath($p2; 1+getpath($p2)) )

If filter.jq contains this filter, and data.json contains sample data, then the command

 $ jq -M -s -f filter.jq data.json

produces

 { "possible_victim01": { "total": 3, "domains": { "evil.com": 2, "soevil.com": 1 } }, "possible_victim02": { "total": 1, "domains": { "bad.com": 1 } }, "possible_victim03": { "total": 1, "domains": { "soevil.com": 1 } } }

+1

jq170727 Sep 05 '17 at 7:38

source share

Jimmy · Accepted Answer · 2015-06-24T20:16:01+0000

You need to do group_by twice, once, to group by machine name, and then subgroup to get the counts for each domain.

jq query:

 group_by(.machine) | map({ "machine": .[0].machine, "total":length, "domains": (group_by(.domain) | map({ "key":.[0].domain, "value":length}) | from_entries ) })

Output Example:

 { "machine": "possible_victim01", "total": 3, "domains": { "evil.com": 2, "soevil.com": 1 } } { "machine": "possible_victim02", "total": 1, "domains": { "bad.com": 1 } } { "machine": "possible_victim03", "total": 1, "domains": { "soevil.com": 1 } }

Use jq to count on several levels

More articles: