Using awk to count the number of occurrences of a word in a column

03/03/2014 12:31:21 BLOCK 10.1.34.1 11:22:33:44:55:66 03/03/2014 12:31:22 ALLOW 10.1.34.2 AA:BB:CC:DD:EE:FF 03/03/2014 12:31:25 BLOCK 10.1.34.1 55:66:77:88:99:AA 

I am trying to use awk to count the number of occurrences of the words โ€œblockโ€ and โ€œaccessโ€ above in a single command.

At first I tried the word "block", but my counter does not work. Can anyone see where my code is wrong?

 awk ' BEGIN {count=0;} { if ($3 == "BLOCK") count+=1} end {print $count}' firewall.log 
+15
linux bash awk
source share
6 answers

Use array

 awk '{count[$3]++} END {for (word in count) print word, count[word]}' file 

If you want to "block" specifically: END {print count["BLOCK"]}

+32
source share

Here is a non-c ode solution. You can bind steps along with pipes ("|").

 awk '{print $3}' file | sort | uniq -c 
  • awk '{print $ 3}'
    print the 3rd column, the default entry separator in awk is a space.

  • sort
    sort results

  • uniq -c
    count the number of repetitions

+17
source share

The reason your code may not work is because END case sensitive, so your script will check for the END variable to exist (which is not the case), and therefore the last block will never be executed. If you change this, then it should work.

Also, you do not need a BEGIN block, since the whole variable is created at 0.

Below I have added an alternative way to do this, which you can use instead.

This is similar to glenn, but it only captures the words you want, because of this it should use a little memory.


Using Gawk (for third match argument)

 awk 'match($3,/BLOCK|ALLOW/,b){a[b[0]]++}END{for(i in a)print i ,a[i]}' file 

This block is executed only if BLOCK or ALLOW contained in the third field.
The match captures what was matched with array b.
Then array a increases for the matched field.

In the END block, each captured field is displayed with an input counter.


Output signal

 ALLOW 1 BLOCK 2 
+4
source share

I checked your expression

 awk ' BEGIN {count=0;} { if ($3 == "BLOCK") count+=1} end {print $count}' firewall.log 

and was able to successfully count BLOCK by doing two changes

  • end should be in the header
  • remove $ from print $count

So this should be:

 awk ' BEGIN {count=0;} { if ($3 == "BLOCK") count+=1} END {print count}' firewall.log 

A simpler statement that also works:

 awk '($3 == "BLOCK") {count++ } END { print count }' firewall.log 
+3
source share

The error in your awk call is that you have print $count in your "END" block. This takes the contents of the count variable, assumes that it is an integer, and tries to find the corresponding field in the last line of input. What you really need is just print count , since it just prints the value in the count variable. Sometimes itโ€™s easy to mix different variable binding schemes between bash , awk , python , etc., so itโ€™s easy to do this.

+1
source share

I have something similar -

i ask gitlab about list of merge requests

curl -Ss -k --header "PRIVATE-TOKEN: $ at" " https: // gitlab / api / v4 / projects / 111 / merge_requests? state = $ 1 & create_after = $ date & target_branch = $ branch & per_page = 100 & page = 1 "| jq -r '. [] | "(.iid) \ t (.author.username)"

and I have a list of such: output:

11039 user7 11038 user6 11037 user5 11036 user4 11035 user1 11034 user3 11033 user2 11032 user1

How to calculate how many merge requests each user raises. How to calculate how many requests user1 grows, how many user2, etc.

when I make this curl as a variable: Request = curl -Ss -k --header "PRIVATE-TOKEN: $at" "https://gitlab/api/v4/projects/111/merge_requests?state=$1&created_after=$date&target_branch=$branch&per_page=100&page=1"| jq -r '.[] | "\(.iid)\t\(.author.username)" curl -Ss -k --header "PRIVATE-TOKEN: $at" "https://gitlab/api/v4/projects/111/merge_requests?state=$1&created_after=$date&target_branch=$branch&per_page=100&page=1"| jq -r '.[] | "\(.iid)\t\(.author.username)"

and print it as:

  echo "list of $1 requests rise today" echo "$request" echo echo "--------stats--------------" echo "\n$request" | awk '/^[0-9]/{a[$2]++}END{for (i in a) print i, a[i]}' echo "---------------------------" echo 

this awk command does not show the correct math for some options. Is there an easier option?

Thanks for the help.

0
source share

All Articles