Arbitrary creation of a string that does NOT match the given regular expression

Question

Arbitrary creation of a string that does NOT match the given regular expression

For testing purposes, in the project I'm working on, I need to, if the regex is provided, arbitrarily create a string that matches FAIL . For example, if I am given this regular expression:

^[abcd]d+

Then I should be able to generate strings such as:

 hnbbad uduebbaef 9f8;djfew skjcc98332f

... each of which does NOT match the regular expression, but does NOT generate:

 addr32 bdfd09usdj cdddddd-9fdssee

... each of which is DO. In other words, I want something like anti-Xeger.

Is there such a library, preferably in Python (if I can understand the theory, I can most likely convert it to Python, if necessary)? I thought about how I could write this, but given the volume of regular expressions, it seemed like it could be much more complicated than what Xeger could solve. I also looked at the library I created earlier, but either I am not using the correct keywords to search, or no one had this problem before.

+6

python regex

Captainspam Nov 09 '12 at 21:58

source share

4 answers

I would loop through generating random combinations of random length, and check if the regex matches. Repeat the cycle until a situation of non-compliance is reached.

Obviously, this would be inefficient. Are you sure you cannot invert the regular expression and create a match in the inverted regular expression?

+3

Fèlix galindo allué Nov 09 '12 at 10:06

source share

No, It is Immpossible. There are an infinite number of regular expressions that correspond to each line in a known universe. For instance:

/^/
/.*/
/[^"\\]*(\\.[^"\\]*)*$/

and etc.

This is because all of these regular expressions cannot match anything at all (something has all the lines!)

+1

ridgerunner Nov 10 '12 at 6:28

source share

Can we reduce the infinite number of possibilities by limiting the creation of strings from a set of assignment characters.

For example, I can define a character set, [ QWERTYUIOP!@ #$%^%^&*))_] , and all the lines that I generate arbitrarily should be born from this set. So we can reduce the infinite nature of this problem?

In fact, even I am looking for such a utility, preferably in Python.

0

Priya Ananthasankar Apr 6 '13 at 14:27

source share

Mark byers · Accepted Answer · 2012-11-09T22:04:13+0000

My initial instinct: no, such a library does not exist, because it is impossible. You cannot be sure that you can find a valid input for any arbitrary regular expression in a reasonable amount of time.

For example, proving whether a number is prime is difficult to solve a mathematical problem. The following regular expression matches any string with a length of at least 10,000 characters and whose total length is a prime:

 (?!(..+)\1+$).{10000}

I doubt that there is any library that can find the correct input for this regular expression in a reasonable amount of time. And this is a very simple example with a simple solution, for example. 'x' * 10007 will work. It would be possible to come up with other regular expressions that are much harder to find valid values for.

I think the only way to solve this problem is to limit yourself to a certain subset of all possible regular expressions.

But, saying that if you have a magic library that generates text that matches any arbitrary regular expression, then all you have to do is create a regular expression that matches all the lines that don't match your original expression.

Fortunately, this is possible with a negative view:

 ^(?![\s\S]*(?:^[abcd]d+))

If you want to change the requirements to allow a limited subset of regular expressions, you can negate the regular expression using logical logic. For example, if ^[abcd]d+ becomes ^[^abcd]|^[abcd][^d] . You can then find the correct input for this regular expression in a reasonable amount of time.

Arbitrary creation of a string that does NOT match the given regular expression

More articles: