How to get multiple matches with difflib.SequenceMatcher?

I use difflib to identify all short string matches in a longer sequence. However, it seems that when there are several matches, difflib returns only one:

> sm = difflib.SequenceMatcher(None, a='ACT', b='ACTGACT')
> sm.get_matching_blocks()
[Match(a=0, b=0, size=3), Match(a=3, b=7, size=0)]

Expected Result:

[Match(a=0, b=0, size=3), Match(a=0, b=4, size=3), Match(a=3, b=7, size=0)]

In fact, the ACTGACT line contains two ACT matches, at positions 0 and 4, both of size 3 (plus another match of size 0 at the end of the lines).

How can I get some matches? I expected difflib to return both positions.

+4
source share
2 answers

, k-nut , . k-nut , . , , , "" / (. , "" , ).

- , BLAST / Smith-Waterman . Python - , , BioPython, , , , , NCBI BLAST +. "" Python, BLAST, FSA-BLAST.

, ( , BLAST), , (B ), - (SW). , , BLAST, SW .

SW Python, pure-Python, ( swalign GitHub, ). Python, scikit-bio SW, scikit-bio -. SW WikiPedia, , , , SIMD- CUDA ++. Python, SSWlib.

+2

difflib ? .

import re
pattern = "ACT"
text = "ACTGACT"
matches = [m.span() for m in re.finditer(pattern, text)]

:

[(0, 3), (4, 7)]

- , ? , , , difflib, .

+2

All Articles