AWK vs MySQL for data aggregation

When trying to find out whether AWK or MySQL is more efficient for processing log files and returning statistics, I noticed the following behavior, which does not make sense to me:

To test this, I used a file with 4 columns and approximately 9 million records. I used the same server, which is a VPS with SSD and 1 GB of RAM.

column1 is a column that has about 10 unique values, and the number of complete unique values ​​for the combination of all columns is approximately 4k.

In MySQL, I use a table defined as a table (column1, column2, column3, column4) without indexes.

Data format:

    column1, column2, column3, column4
    column1, column2, column3, column4

AWK Script:

BEGIN {
    FS = ",";
    time = systime();
}  {
    array[$1]++;  #first test
    #array[$1 "," $2 "," $3 "," $4]++; #second test
}
} END {
    for (value in array) {
            print "array[" value "]=" array[value];
    }
}

MySQL query:

Query 1: SELECT column1, count(*) FROM log_test GROUP BY column1;

Query 2: SELECT column1, column2, column3, column4, count(*) 
FROM log_test GROUP BY column1, column2, column3, column4;

AWK , MySQL. , , 10 , MySQL 7 , AWK 22 .

, awk , , , , 4k , AWK , , , , . AWK 90 , .1% MEM, MySQL 45 3% MEM.

  • AWK 2, 1, ?
  • AWK awk ?
  • MySQL , ?
  • ?
+4
2

Awk ( ). , 2- 3-

, , ? Force awk, ( ),

MySQL , . , , , , awk ( MySQL, char (10) MySQL ).

, , . , , C, ( )

0

; , .

, Awk . , , , .

, :

[$ 1] ++

10 $1, 20 ( MYSQL). 20 . 9 10 , 20 , , "" .

:

[$ 1 "," $2 "," $3 "," $4] ++

80 , . 20 .

, 4000 , , 9 4000 80 .

, Awk /- ( , - , / ), , 10 4000 .

, AWK. 5 20 , 4 .

, , AWK MYSQL , MYSQL. , AWK MYSQL , , , MYSQL .

MYSQL , , QUERY, , .

0

All Articles