I use difflib to identify all short string matches in a longer sequence. However, it seems that when there are several matches, difflib returns only one:
> sm = difflib.SequenceMatcher(None, a='ACT', b='ACTGACT')
> sm.get_matching_blocks()
[Match(a=0, b=0, size=3), Match(a=3, b=7, size=0)]
Expected Result:
[Match(a=0, b=0, size=3), Match(a=0, b=4, size=3), Match(a=3, b=7, size=0)]
In fact, the ACTGACT line contains two ACT matches, at positions 0 and 4, both of size 3 (plus another match of size 0 at the end of the lines).
How can I get some matches? I expected difflib to return both positions.
source
share