Search for string * and * its substring in haystack

Suppose you have a string (e.g. needle ). Its 19 continuous substrings:

 needle needl eedle need eedl edle nee eed edl dle ne ee ed dl le nedl 

If I were to create a regular expression to match, in a haystack, any of the substrings that I could just do:

 /(needle|needl|eedle|need|eedl|edle|nee|eed|edl|dle|ne|ee|ed|dl|le|n|e|d|l)/ 

but it doesn’t look very elegant. Is there a better way to create a regular expression that will greedily match any of the substrings of a given string?

Besides, if I set another restriction, I wanted to combine only substrings that exceed the threshold value, for example. for substrings of at least 3 characters:

 /(needle|needl|eedle|need|eedl|edle|nee|eed|edl|dle)/ 

Note: I did not specifically mention any particular regular expression dialect. Indicate which one you use in your answer.

+8
string substring regex
source share
4 answers

As suggested by Qtax, the expression

n(e(e(d(l(e)?)?)?)?)?|e(e(d(l(e)?)?)?)?|e(d(l(e)?)?)?|d(l(e)?)?|l(e)?|e

is the way to go if you want to write an explicit regular expression ( egrep syntax, it is not necessary to replace (...) with (?:...) )). The reason this is better than the original solution is because the compressed version only requires O (n ^ 2) space compared to the O (n ^ 3) space in the original version, where n is the input length. Try this with extraordinarily as input to see the difference. I guess the condensed version also works faster with many regexp engines.

Expression

nee(d(l(e)?)?)?|eed(l(e)?)?|edl(e)?|dle

will search for substrings of length 3 or more.

As vhallac pointed out, the generated regular expressions are a bit redundant and can be optimized. In addition to the proposed Emacs tool, there is a Perl Regexp :: Optimizer package that I hope will help here, but a quick check failed for the first regular expression.

Note that many regexp engines perform non-overlapping searches by default. Check this with the requirements of your problem.

+4
source share

I found an almost elegant solution, depending on how much you need only one regular expression. For example, here is regexp, which finds a common substring (perl) of length 7:

 "$needle\0$heystack" =~ /(.{7}).*?\0.*\1/s 

The corresponding line is in \ 1 . Strings must not contain a null character, which is used as a delimiter.

You have to make a loop starting with the length of the needle and going down and trying to match the regular expression.

+3
source share

Is there a better way to create a regex that matches any substring of a given string?

Not. But you can easily generate such an expression.

+1
source share

Maybe you're just looking .*(.{1,6}).*

-2
source share

All Articles