Regex: ignore extra characters

I am trying to figure out how to detect extra characters in a spam word, for example:

pha.rmacy or vi*agra

any ideas?

+5
source share
3 answers

You can use a similarity metric (dis), such as edit distance . For example, the editing distance between vi.agra and viagra is 1.

Then you determine that the given word matches the spam word if the editing distance between them is below a certain threshold, for example, 2.

, - /[^a-zA-Z0-9-\s]/ . , - viZagra , viagra.

+3

. , , , - - :

/v.?i.?a.?g.?r.?a/

0 1 .

+2

, . , :

/p[^\s\w]*h[^\s\w]*a[^\s\w]*r[^\s\w]*m[^\s\w]*a[^\s\w]*c[^\s\w]*y/

. , Perl:

$re = join("[^\\s\\w]*", split("", "pharmacy"))

, , , .

+1
source

All Articles