When trying to find out whether AWK or MySQL is more efficient for processing log files and returning statistics, I noticed the following behavior, which does not make sense to me:
To test this, I used a file with 4 columns and approximately 9 million records. I used the same server, which is a VPS with SSD and 1 GB of RAM.
column1 is a column that has about 10 unique values, and the number of complete unique values ββfor the combination of all columns is approximately 4k.
In MySQL, I use a table defined as a table (column1, column2, column3, column4) without indexes.
Data format:
column1, column2, column3, column4
column1, column2, column3, column4
AWK Script:
BEGIN {
FS = ",";
time = systime();
} {
array[$1]++; #first test
#array[$1 "," $2 "," $3 "," $4]++; #second test
}
} END {
for (value in array) {
print "array[" value "]=" array[value];
}
}
MySQL query:
Query 1: SELECT column1, count(*) FROM log_test GROUP BY column1;
Query 2: SELECT column1, column2, column3, column4, count(*)
FROM log_test GROUP BY column1, column2, column3, column4;
AWK , MySQL. , , 10 , MySQL 7 , AWK 22 .
, awk , , , , 4k , AWK , , , , . AWK 90 , .1% MEM, MySQL 45 3% MEM.
- AWK 2, 1, ?
- AWK awk ?
- MySQL , ?
- ?