Regular expression listing all features

Given a regex, how can I list all possible matches? For example: AB [CD] 1234, I want it to return a list like: ABC1234 ABD1234

I searched on the internet but didnโ€™t find anything.

+7
regex
source share
9 answers

The reason you didn't find anything is probably because it is a problem of serious complexity, given the number of combinations that allow certain expressions. Some regular expressions may even allow infinite matches:

Consider the following expressions:

AB[A-Z0-9]{1,10}1234 AB.*1234 

I think that it would be best to create the algorithm yourself based on a small subset of the allowed patterns. In your particular case, I would suggest using a more naive approach than regex.

+8
source share

For some simple regular expressions like the ones you provided (AB [CD] 1234), there is a limited set of matches. But for other expressions (AB [CD] * 1234) the number of possible matches is not limited.

One way to identify all the possibilities is to determine where in the regular expression there is a choice. For each possible choice, create a new regular expression based on the original regular expression and the current selection. This new regular expression is now slightly simpler than the original.

For an expression like "A [BC] [DE] F", the method will act as follows

 getAllMatches("A[BC][DE]F") = getAllMatches("AB[DE]F") + getAllMatches("AC[DE]F") = getAllMatches("ABDF") + getAllMatches("ABEF") + getAllMatches("ACDF")+ getAllMatches("ACEF") = "ABDF" + "ABEF" + "ACDF" + "ACEF" 
+3
source share

impossible.

In fact.

Look forward to approval. What about .* , How will you generate all the possible lines matching this regular expression?

+2
source share

The regular expression is intended not only to match the pattern, but as they say, the regular expression will never "list" anything, just a match. If you want to get a list of all the matches, I believe that you will need to do it yourself.

+2
source share

You can write an algorithm for this, but it will only work for regular expressions that have a finite set of possible matches. Your regular expressions will be limited to using:

  • Additionally:?
  • Characters:. \ D \ D
  • Sets: for example [1a-c]
  • Deferred sets: [^ 2-9d-z]
  • Alternation: |
  • Positive images

So your regular expressions CANNOT use:

  • Repeaters: * +
  • Word Templates: \ w \ W
  • Negative images
  • Some statements of zero width: ^ $

And there are some others (word boundaries, lazy and greedy quantifiers). I'm not sure yet.

As for the algorithm itself, another user posted a link to this answer , which describes how to create it.

+2
source share

Exrex can do this:

 $ python exrex.py 'AB[CD]1234' ABC1234 ABD1234 
+2
source share

You may be able to find some code to list all possible matches for something as simple as you do. But most regular expressions you donโ€™t even want to try to list all possible matches.

For example, AB. * 1234 will be AB followed by absolutely everything, and then 1234.

+1
source share

I am not entirely sure that this is even possible, but if that were the case, then for many situations it would be so difficult / time to be useless.

For example, try to list all matches for A. * Z

There are sites that help in creating a good regular expression:

0
source share

Well, you can convert the regular expression into an equivalent final state machine (relatively simple and can be done algorithmically), and then recursively find all possible paths through fsm, displaying subsequent paths through the machine. It is neither very complex, nor computer-intensive at the output (you usually get a HUGE amount of output). However, you should take care to prevent potentially endless passes (e.g .* ). This can be done with the maximum permissible path length, after which the trace is interrupted

0
source share

All Articles