Matching Python words with the same index in a string

I have two lines of the same length and you want to combine words that have the same index. I am also trying to match successive matches in which I am having problems.

For example, I have two lines

alligned1 = 'I am going to go to some show' alligned2 = 'I am not going to go the show' 

I am looking to get the result:

 ['I am','show'] 

My current code is as follows:

 keys = [] for x in alligned1.split(): for i in alligned2.split(): if x == i: keys.append(x) 

What gives me:

 ['I','am','show'] 

Any guidance or help would be appreciated.

+8
python string matching
source share
4 answers

Finding matching words is pretty simple, but putting them into adjacent groups is pretty hard. I suggest using groupby .

 import itertools alligned1 = 'I am going to go to some show' alligned2 = 'I am not going to go the show' results = [] word_pairs = zip(alligned1.split(), alligned2.split()) for k, v in itertools.groupby(word_pairs, key = lambda pair: pair[0] == pair[1]): if k: words = [pair[0] for pair in v] results.append(" ".join(words)) print results 

Result:

 ['I am', 'show'] 
+10
source share

Simplification of your code will be:

 alligned1 = 'I am going to go to some show' alligned2 = 'I am not going to go the show' keys = [] for i, word in enumerate(alligned1.split()): if word == alligned2.split()[i]: keys.append(word) 

Then we need to track if we just matched a word, let it be done with a flag variable.

 alligned1 = 'I am going to go to some show' alligned2 = 'I am not going to go the show' keys = [] prev = '' for i, word in enumerate(alligned1.split()): if word == alligned2.split()[i]: prev = prev + ' ' + word if prev else word elif prev: keys.append(prev) prev = '' 
+3
source share

Good Kevin's answer is best and accurate. I tried to do it rudely. This doesn’t look very good, but does the job without importing.

 alligned1 = 'I am going to go to some show'.split(' ') alligned2 = 'I am not going to go the show'.split(' ') keys = [] temp = [v if v==alligned1[i] else None for i,v in enumerate(alligned2) ] temp.append(None) tmpstr = '' for i in temp: if i: tmpstr+=i+' ' else: if tmpstr: keys.append(tmpstr) tmpstr = '' keys = [i.strip() for i in keys] print keys 

Exit

 ['I am', 'show'] 
+1
source share

Maybe not very elegant, but it works:

 from itertools import izip_longest alligned1 = 'I am going to go to some show' alligned2 = 'I am not going to go the show' curr_match = '' matches = [] for w1, w2 in izip_longest(alligned1.split(), alligned2.split()): if w1 != w2: if curr_match: matches.append(curr_match) curr_match = '' continue if curr_match: curr_match += ' ' curr_match += w1 if curr_match: matches.append(curr_match) print matches 

result:

 ['I am', 'show'] 
0
source share

All Articles