I was asked today if there is a library to take a list of strings and calculate the most efficient regular expression to match only those strings. I think he is NP Complete problem himself , but I think we can improve the area a bit.
How do I create and simplify a regex to match a subset of hosts from a larger set of all hosts on my network? (Knowing that I cannot get the most effective regular expression.)
The first step is very simple. From the following list:
- appserver1.domain.tld
- appserver2.domain.tld
- appserver3.domain.tld
I can combine and avoid them in
appserver1\.domain\.tld|appserver2\.domain\.tld|appserver3\.domain\.tld
And I know how to manually simplify a regular expression in
appserver[123]\.domain\.tld
, 3 . , . ( Perl, Javascript #) ?
. perl, . Javascript. , perl JS, .