BASH - count the number of similar lines in a file

I have a topic on the forum where people can write their Top 10 Song List. I want to count how many times the song is indicated. Similarities should be compared case insensitive.

Example file structure:

Join Date: Apr 2005
Location: bama via new orleans
Age: 48
Posts: 2,369
Re: Top 10 Songs Jethro Tull
oh dearrrr. the only way for all kaths to keep their last shred of sanity: fly through this list as quickly as possible, without stopping to think for a microsecond...
velvet green
dun ringill
skating away on the thin ice of a new day
sossity yer a woman
fat man
life a long song
jack-a-lynn
teacher
mother goose
elegy

 03-10-2010, 02:29 AM      #5 (permalink)
Sox
Avoiding The Swan Song



Join Date: Jan 2010
Location: Derbyshire, England
Age: 43
Posts: 5,991
 Re: Top 10 Songs Jethro Tull
Wow !!!! Where do I start ?
Dun Ringill
Aqualung
With You There To Help Me
Jack Frost And The Hooded Crow
We Used To Know
Witch Promise
Pussy Willow
Heavy Horses
My Sunday Feeling
Locomotive Breath

Join Date: Nov 2009
Posts: 1,418
 Re: Top 10 Songs Jethro Tull
Too bad they all can't make the list, but here ten I never get tired of listening to:

Christmas Song
Witches Promise
Life A Long Song
Living In The Past
Rainbow Blues
Sweet Dream
Minstral In The Gallery
Cup of Wonder
Rover
Something On the Move

Output Example:

life a long song 3
aqualung 1
...
+5
source share
3 answers

The "structure of your file" is slightly missing in the structural department, so you will have to deal with some errors in the process.

Assuming you have it all in a file with a name input, try:

tr '[A-Z]' '[a-z]' < input | \
     egrep -v "^ *(join date|age|posts|location|re):" | \
     sort | \
     uniq -c

The first line reduces everything, the second cuts out things that look like the message headers in your example, then sorts and counts the unique elements.

+11
source

,

sort nameFile | uniq -c 
+7

awk -

awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' INPUT_FILE

:

First, we identify the lines that have :or are in them blankand ignore them. All other stored strings are converted to uppercase and stored in an array. In ours, END statementwe print everything in our array and the number of times it was found.

Test:

awk '
/:/||/^$/{next}{a[toupper($0)]++}
END{for(i in a) print i,a[i]}' file1
SOX 1
CHRISTMAS SONG 1
CUP OF WONDER 1
SOSSITY YER A WOMAN 1
FAT MAN 1
PUSSY WILLOW 1
VELVET GREEN 1
WITH YOU THERE TO HELP ME 1
ELEGY 1
WE USED TO KNOW 1
TEACHER 1
MY SUNDAY FEELING 1
SWEET DREAM 1
JACK-A-LYNN 1
SOMETHING ON THE MOVE 1
ROVER 1
DUN RINGILL 2
AVOIDING THE SWAN SONG 1
JACK FROST AND THE HOODED CROW 1
WITCHES PROMISE 1
LIFE A LONG SONG 2
LIVING IN THE PAST 1
WITCH PROMISE 1
WOW !!!! WHERE DO I START ? 1
SKATING AWAY ON THE THIN ICE OF A NEW DAY 1
MINSTRAL IN THE GALLERY 1
RAINBOW BLUES 1
MOTHER GOOSE 1
HEAVY HORSES 1
AQUALUNG 1
LOCOMOTIVE BREATH 1
+2
source

All Articles