Match words in vocabulary lists and counts

Question

Match words in vocabulary lists and counts

So, I have a common text file with some letters in it, it really varies in random order, but I also have a list of words that I want to compare with it and count the occurrences of each word that appears in the text file that is on list of words.

For example, my list of words may consist of this:

good
bad 
cupid
banana
apple

Then I want to compare each of these individual words with my text file, which could be like this:

Sometimes I travel to the good places that are good, and never the bad places that are bad. For example I want to visit the heavens and meet a cupid eating an apple. Perhaps I will see mythological creatures eating other fruits like apples, bananas, and other good fruits.

I want my output to generate how many times each occurrence of the listed words occurs. I have a way to do this: awkand for-loop, but I really want to avoid it for-loop, as it will be forever, since my list of real words is about 10,000 words.

, ( ) 9, .

, .

+4

list bash grep awk sed

CrudeCoder 07 . '13 19:09

4

grep wc:

cat <<EOF > word.list
good
bad 
cupid
banana
apple
EOF

cat <<EOF > input.txt
Sometimes I travel to the good places that are good, and never the bad places that are bad. For example I want to visit the heavens and meet a cupid eating an apple. Perhaps I will see mythological creatures eating other fruits like apples, bananas, and other good fruits.
EOF

while read search ; do
    echo "$search: $(grep -o $search input.txt | wc -l)" 
done < word.list | awk '{total += $2; print}END{printf "total: %s\n", total}'

:

good: 3
bad: 2
cupid: 1
banan: 1
apple: 2
total: 9

+3

hek2mgl 07 . '13 19:14

Awk:

awk -f cnt.awk words.txt input.txt

cnt.awk:

FNR==NR {
    word[$1]=0
    next
}
{
    str=str $0 RS
}
END{
    for (i in word) {
        stri=str
        while(match(stri,i)) {
           stri=substr(stri,RSTART+RLENGTH)
           word[i]++
        }
    }
    for (i in word)
        print i, word[i]
}

+2

Håkon Hægland 07 . '13 19:34

IF , @hek2mgl:

while read word; do
    grep -o $word input.txt
done < words.txt | wc -l

, :

while read word; do
    grep -o "$word" input.txt
done < words.txt | sort | uniq -c | awk '{ total += $1; print } END { print "total:", total }'

, , grep:

while read word; do
    grep -o "\<$word\>" input.txt
done < words.txt | sort | uniq -c | awk '{ total += $1; print } END { print "total:", total }'

banana bananas . , banana bananas, :

while read word; do
    grep -o "\<$word" input.txt
done < words.txt | sort | uniq -c | awk '{ total += $1; print } END { print "total:", total }'

, , grep :

paste -d'|' - - - < words.txt | sed -e 's/ //g' -e 's/\|*$//' | while read words; do
    grep -oE "\<($words)\>" input.txt
done

grep . - paste, , :

paste -d'|' - - - - - - - - - - < words.txt | ...

, , , awk @HakonHægland

+2

janos 07 . '13 19:56

Hynek -Pichi- Vychodil · Accepted Answer · 2013-12-07T20:31:02+0000

:

perl -nE'BEGIN{open my$fh,"<",shift;my@a=map lc,map/(\w+)/g,<$fh>;@h{@a}=(0)x@a;close$fh}exists$h{$_}and$h{$_}++for map lc,/(\w+)/g}{for(keys%h){say"$_: $h{$_}";$s+=$h{$_}}say"Total: $s"' word.list input.txt

Match words in vocabulary lists and counts

More articles: