In Perl, that would be pretty simple. The first step is to normalize each regular expression by changing A+ to AA* , A*A to AA* and A*A* to A* :
sub normalize_regex($) { local $_ = shift; s/(.)\+/$1$1*/g; 1 while s/(.)\*\1(?!\*)/$1$1*/g or s/(.\*)\1/$1/g; return $_; }
Step two is to convert the first regular expression from the regular expression that matches the strings itself to Perl-regex, which matches the normalized regular expressions that match those strings; for example, AA*B will be converted to ^AA*\*?B$ , which means βbeginning of lineβ, followed by A, followed by zero or more A, optionally followed by an asterisk, followed by B, and then the end of the line ":
sub regex_to_metaregex($) { local $_ = shift; s/(.)(\*?)/$2 ? "\Q$1\E*(\Q$1\E\\*)?" : "\Q$1"/eg; return qr/^$_$/; }
Step three needs no explanation:
sub does_regex1_cover_regex2($$) { my ($r1, $r2) = @_; $r1 = regex_to_metaregex normalize_regex $r1; $r2 = normalize_regex $r2; return scalar $r2 =~ m/$r1/; }
This returns the true value for your cases # 1 - 3. However, it returns a false value for your case # 4, because if I really lose nothing, A+M+BC* does not cover AMM+B+C+M+BC* ?
Note that there should also be a way to avoid duplicate characters "*" and "+" in the input lines strA and strB.
I was not worried about this in the above code, but since you are only concerned with ASCII, the preprocessing step can process the \* value * , \+ , which means + and \\ , which means \ , translating them into separate characters outside the ASCII range:
sub process_escapes($) { local $_ = shift; s/\\\\/\x80/g; s/\\\+/\x81/g; s/\\\*/\x82/g; s/\x80/\\/g; return $_; }
(although this is clearly pretty hacky).
In C ++ you can use the same approach - there are libraries that implement all the necessary functions of Perl regexes - although obviously this will be a bit more.