Check if two lines start with the same character, if that means the average value of the result, if not, displays the actual value

Question

Check if two lines start with the same character, if that means the average value of the result, if not, displays the actual value

I would like to check if two rows start with the same number in the 1st column, if that happens then the average value of the second column should be displayed. Example file:

01 21 6 10% 93.3333% 01 22 50 83.3333% 93.3333% 02 20.5 23 18.1102% 96.8504% 02 21.5 100 78.7402% 96.8504% 03 22.2 0 0% 100% 03 21.2 29 100% 100% 04 22.5 1 5.55556% 100% 04 23.5 17 94.4444% 100% 05 22.7 9 7.82609% 100% 05 21.7 106 92.1739% 100% 06 23 11 17.4603% 96.8254% 06 22 50 79.3651% 96.8254% 07 20.5 14 18.6667% 96% 07 21.5 58 77.3333% 96% 08 21.8 4 100% 100% 09 22.6 0 0% 100% 09 21.6 22 100% 100%

For example, the first two lines begin with 01 , but there is only one line starting with 08 (15th line). Therefore, a result based on these two cases should be:

 01 21.5 ... ... ... 08 21.8 ... ... ...

I ended up with the following awk line, which works fine when the file always has two similar lines, but it doesn't work using the file shown above (because of the 15th line):

 awk '{sum+=$2} (NR%2)==0{print sum/2; sum=0;}'

Any hints are welcome,

+6

bash shell awk

Gery 25 sept. '15 at 19:57

source share

4 answers

Using GNU awk

 gawk ' {sum[$1]+=$2; n[$1]++} END { PROCINFO["sorted_in"] = "@ind_num_asc" for (key in sum) print key, sum[key]/n[key] } ' file

 01 21.5 02 21 03 21.7 04 23 05 22.2 06 22.5 07 21 08 21.8 09 22.1

The PROCINFO line allows you to bypass an array to sort my index numerically. Otherwise, the output will be random.

+4

glenn jackman 25 sept. '15 at 20:13

source share

awk channel sorted

 awk '{s[$1]+=$2;c[$1]++} END{for(i in s) print i, s[i]/c[i]}' file | sort

+1

karakfa 25 sept. '15 at 20:22

source share

 awk ' second{ if($1 == first){ print (second + $2) / 2 second = 0 next } else print second } { printf "%s ", $1 fist = $1 second = $2 } END{ if(second) print second }' file

+1

Costas 25 sept. '15 at 20:51

source share

anubhava · Accepted Answer · 2015-09-25T20:13:51+0000

This awk should work:

 awk 'function dump(){if (n>0) printf "%s%s%.2f\n", p, OFS, sum/n} NR>1 && $1 != p{dump(); sum=n=0} {p=$1; sum+=$2; n++} END{dump()}' file 01 21.5 02 21.0 03 21.7 04 23.0 05 22.2 06 22.5 07 21.0 08 21.8 09 22.1

Explanation: We use 3 variables:

 p -> to hold previous row $1 value n -> count of similar $1 values sum -> is sum of $2 values for similar $1 rows

How it works:

 NR>1 && $1 != p # when row #1 > 1 and prev $1 is not current $1 dump() # function is to print formatted value of $1 and average p=$1; sum+=$2; n++ # sets p to $1, adds current $2 to sum and increments n

Check if two lines start with the same character, if that means the average value of the result, if not, displays the actual value

More articles: