I am very sorry for this other noob question, but I cannot understand what is going on here. I want to calculate the frequency of words from a file, where the words are one at a time. The file is really large, so this can be a problem (in this example, it is designed for 300 thousand lines)
I execute this command:
cat .temp_occ | uniq -c | sort -k1,1nr -k2 > distribution.txt
and the problem is that he gives me a little mistake: he considers me the same words as different ones. For example, the first entries:
306 continua 278 apertura 211 eventi 189 murah 182 giochi 167 giochi
repeating twice, as you can see
the bottom of the file gets worse and it looks like this:
1 win 1 win 1 win 1 win 1 win 1 win 1 win 1 win 1 win 1 winchester 1 wind 1 wind
for all words
I'm sorry again for the stupid question, but I'm a little null with shell programming. What am I doing wrong?
Many thanks
source share