Most Common Words Between Two Files Using Python

Question

Most Common Words Between Two Files Using Python

I am new to Python and trying to write a script that finds the most common words between two files. I can find the most common words between two files separately, but not sure how to count allows you to say 5 words that are common in both files? You need to find common words, and the frequency of these common words between both files should also be higher.

import re
from collections import Counter


finalLineLower=''
with open("test3.txt", "r") as hfFile:
        for line in hfFile:
            finalLine = re.sub('[,.<;:)-=!>_(?"]', '', line)            
            finalLineLower += finalLine.lower()
            words1 = finalLineLower.split()

f = open('test2.txt', 'r')
sWords = [line.strip() for line in f]


finalLineLower1=''
with open("test4.txt", "r") as tsFile:
        for line in tsFile:
            finalLine = re.sub('[,.<;:)-=!>_(?"]', '', line)            
            finalLineLower1 += finalLine.lower()
            words = finalLineLower1.split()
#print (words)
mc = Counter(words).most_common()
mc2 = Counter(words1).most_common()

print(len(mc))
print(len(mc2))

Examples of test3 and test4 are given below. test3:

Essays are generally scholarly pieces of writing giving the author own argument, but the definition is vague, overlapping with those of an article, a pamphlet and a short story.

test4:

Essays are generally scholarly pieces of writing giving the author own argument, but the definition is vague, overlapping with those of an article, a pamphlet and a short story.

Essays can consist of a number of elements, including: literary criticism, political manifestos, learned arguments, observations of daily life, recollections, and reflections of the author. Almost all modern essays are written in prose, but works in verse have been dubbed essays (e.g. Alexander Pope An Essay on Criticism and An Essay on Man). While brevity usually defines an essay, voluminous works like John Locke An Essay Concerning Human Understanding and Thomas Malthus An Essay on the Principle of Population are counterexamples. In some countries (e.g., the United States and Canada), essays have become a major part of formal education. Secondary students are taught structured essay formats to improve their writing skills, and admission essays are often used by universities in selecting applicants and, in the humanities and social sciences, as a way of assessing the performance of students during final exams.

+4

python python-2.7

user3314492 Jun 05 '15 at 7:48

source share

2 answers

Kasramvd · Answer 1 · 2015-06-05T07:51:29+0000

You can simply find the intersection between your objects Counterwith the operand &:

mc = Counter(words)
mc2 = Counter(words1)
total=mc&mc2
mos=total.most_common(N)

Example:

>>> d1={'a':5,'f':2,'c':1,'h':2,'t':4}
>>> d2={'a':3,'b':2,'e':1,'h':5,'t':6}
>>> c1=Counter(d1)
>>> c2=Counter(d2)
>>> t=c1&c2
>>> t
Counter({'t': 4, 'a': 3, 'h': 2})
>>> t.most_common(2)
[('t', 4), ('a', 3)]

, & , union |, , dict :

>>> m=c1|c2
>>> m
Counter({'t': 6, 'a': 5, 'h': 5, 'b': 2, 'f': 2, 'c': 1, 'e': 1})
>>> max={i:j for i,j in m.items() if i in t}
>>> max
{'a': 5, 'h': 5, 't': 6}

, , :

>>> s=Counter(max)+t
>>> s
Counter({'t': 10, 'a': 8, 'h': 7})

abarnert · Answer 2 · 2015-06-05T08:01:43+0000

.

, - , , , 10000 1 1 2, 10001 . :

mc = Counter(words) + Counter(words1) # or Counter(chain(words, words1))
mos = mc.most_common(5)

, , :

mc = Counter(words)
mc1 = Counter(words1)
mcmerged = Counter({word: max(mc[word], mc1[word]) for word in mc if word in mc1})
mos = mcmerged.most_common(5)

, :

mc = Counter(words)
mc1 = Counter(words1)
mcmerged = Counter({word: mc[word] + mc1[word] for word in mc if word in mc1})

, , . , Python; , .

, mc = Counter(words).most_common() mc = Counter(words) mc = Counter(words) + Counter(words1) .. . most_common() Counter, list, Counter. ... , , .

Most Common Words Between Two Files Using Python

More articles: