I am new to Perl and for one of my homework I came up with this solution:
if (length($ARGV[0]) < 1)
{
print "Usage is : words.pl word filename\n";
exit;
}
my $file = $ARGV[0];
open(FILE, $file);
while(<FILE>)
{
chomp;
tr/A-Z/a-z/;
tr/.,:;!?"(){}//d; #remove some common punctuation symbols
#We are creating a hash with the word as the key.
#Each time a word is encountered, its hash is incremented by 1.
#If the count for a word is 1, it is a new distinct word.
#We keep track of the number of words parsed so far.
#We also keep track of the no. of words of a particular length.
foreach $wd (split)
{
$count{$wd}++;
if ($count{$wd} == 1)
{
$dcount++;
}
$wcount++;
$lcount{length($wd)}++;
}
}
#To print the distinct words and their frequency,
#we iterate over the hash containing the words and their count.
print "\nThe words and their frequency in the text is:\n";
foreach $w (sort keys%count)
{
print "$w : $count{$w}\n";
}
#For the word length and frequency we use the word length hash
print "The word length and frequency in the given text is:\n";
foreach $w (sort keys%lcount)
{
print "$w : $lcount{$w}\n";
}
print "There are $wcount words in the file.\n";
print "There are $dcount distinct words in the file.\n";
$ttratio = ($dcount/$wcount)*100; #Calculating the type-token ratio.
print "The type-token ratio of the file is $ttratio.\n";
I included a comment to mention what he is doing. Actually I need to find the number of words from a given text file. The output of the above program will look like this:
The words and their frequency in the text is:
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
.
.
.
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file.
There are 61 distinct words in the file.
The type-token ratio of the file is 71.7647058823529.
Even though with the help of Google I can find a solution for my homework. But I think there will be smaller and more concise code using the real power of Perl. Can someone give me a solution in Perl with much fewer lines of code?
source
share