Regular Expression Algorithm

Given a substring, is there a way to generate all possible regular expressions (the most restrictive, the least restrictive) that would match this substring for a given string?

For example, let's say you have the substring โ€œorangeโ€ and the string โ€œapple banana orange grapesโ€. How to get a list of regular expressions that match "orange" (I know there will be many, hoping that there is already a library for me).

+4
source share
1 answer

This is basically the same as asking "given some runtime requirements, is there a way to generate all possible programs (most efficient for least efficient) that would meet these requirements for a given input?" The answer is yes, there is a way to do this, but the number of results is infinite, limited only by the boundaries of reasonable memory and restrictions on the implementation of the language, so you will need to impose restrictions on what constitutes a real "program" "for your purposes, in order to reduce it to a finite set .

For example, you can limit yourself to a specific grammar, varieties representing a subset of the regular expression language in question, and generate only regular expressions that match this grammar:

  Regex :: = StartAnchor?  Anything?  (Substring | Anything) Anything?  Endanchor?

 StartAnchor :: = "^"

 Anything :: = ". *"
               |  "(. *)"

 Substring :: = "orange"
               |  "(orange)"

 EndAnchor :: = "$"

Recursively take all the paths of this grammar (i.e., each branch labeled ? And | ) to generate all your target regular expressions. Of course, this answer deliberately says nothing about whether this is a good idea or even necessary ...

+5
source

All Articles