Checking whether two patterns match each other?

This problem with Leetcode is how to map the template string to the text string as efficiently as possible. A pattern string can consist of letters, dots, and stars, where the letter matches only itself, the dot matches any single character, and the star matches any number of instances of the previous character. For example, a template

ab*c. 

will match ace and abbbbcc . I know that this original problem can be solved using dynamic programming.

My question is whether it is possible to see if the two patterns match each other. For example, a template

 bdbaa.* 

may match

 bdb.*daa 

Is there a good algorithm to solve this problem of pattern matching on a pattern?

+8
algorithm regex
source share
5 answers

I worked on my understanding of DP and came up with the following implementation of the above problem. Feel free to edit the code in case someone finds out that any test cases have failed. For my part, I tried several test cases and passed them all, which I will also discuss below.

Note that I have expanded on the idea that is used to solve regex pattern matching with string using DP. To address this idea, refer to the LeetCode link provided in the OP, and pay attention to part of the discussion. They gave an explanation for matching regex and string.

The idea is to create a dynamic memoization table whose records will follow the rules below:

  • If pattern1 [i] == pattern2 [j], dp [i] [j] = dp [i-1] [j-1]
  • If pattern1 [i] == '.' or pattern2 [j] == '.', then dp [i] [j] = dp [i-1] [j-1]
  • The trick lies here: if pattern1 [i] = '*', then if dp [i-2] [j] exists, then dp [i] [j] = dp [i-2] [j] || dp [i] [j-1] else dp [i] [j] = dp [i] [j-1].
  • If pattern2 [j] == '*', then if pattern1 [i] == pattern2 [j-1], then dp [i] [j] = dp [i] [j-2] || dp [I-1] [J] else dp [i] [j] = dp [i] [j-2]

pattern1 goes in rows, and pattern2 goes in columns. Also note that this code should also work for it to match the regular expression pattern with any string normally. I tested it by running it on LeetCode and it passed all the available test cases!

The following is a complete working implementation of the above logic:

 boolean matchRegex(String pattern1, String pattern2){ boolean dp[][] = new boolean[pattern1.length()+1][pattern2.length()+1]; dp[0][0] = true; //fill up for the starting row for(int j=1;j<=pattern2.length();j++){ if(pattern2.charAt(j-1) == '*') dp[0][j] = dp[0][j-2]; } //fill up for the starting column for(int j=1;j<=pattern1.length();j++){ if(pattern1.charAt(j-1) == '*') dp[j][0] = dp[j-2][0]; } //fill for rest table for(int i=1;i<=pattern1.length();i++){ for(int j=1;j<=pattern2.length();j++){ //if second character of pattern1 is *, it will be equal to //value in top row of current cell if(pattern1.charAt(i-1) == '*'){ dp[i][j] = dp[i-2][j] || dp[i][j-1]; } else if(pattern1.charAt(i-1)!='*' && pattern2.charAt(j-1)!='*' && (pattern1.charAt(i-1) == pattern2.charAt(j-1) || pattern1.charAt(i-1)=='.' || pattern2.charAt(j-1)=='.')) dp[i][j] = dp[i-1][j-1]; else if(pattern2.charAt(j-1) == '*'){ boolean temp = false; if(pattern2.charAt(j-2) == pattern1.charAt(i-1) || pattern1.charAt(i-1)=='.' || pattern1.charAt(i-1)=='*' || pattern2.charAt(j-2)=='.') temp = dp[i-1][j]; dp[i][j] = dp[i][j-2] || temp; } } } //comment this portion if you don't want to see entire dp table for(int i=0;i<=pattern1.length();i++){ for(int j=0;j<=pattern2.length();j++) System.out.print(dp[i][j]+" "); System.out.println(""); } return dp[pattern1.length()][pattern2.length()]; } 

Driver Method:

 System.out.println(e.matchRegex("bdbaa.*", "bdb.*daa")); Input1: bdbaa.* and bdb.*daa Output1: true Input2: .*acd and .*bce Output2: false Input3: acd.* and .*bce Output3: true 

Time complexity: O(mn) where m and n are the lengths of two given regular expression patterns. The same will be spatial complexity.

+2
source share

Here is one approach that works in polynomial time. This is a bit heavyweight, and there may be a more effective solution.

The first point, which, I think, helps revise the issue here. Instead of asking if these patterns match each other, ask this equivalent question:

Given the given patterns P1 and P2, there exists a string w, where P1 and P2 correspond to w?

In other words, instead of trying to match two patterns to each other, we will look for a string that matches each pattern.

You may have noticed that the types of patterns you can work with are a subset of regular expressions. This is useful because there is a rather complicated theory of what you can do with regular expressions and their properties. Therefore, instead of striving for your original problem, solve it even more general:

Given the two regular expressions R1 and R2, is there a string w that matches both R1 and R2?

The reason for solving this more general problem is that it allows you to use a theory developed around regular expressions. For example, in the formal theory of language we can talk about the language of a regular expression, which is a set of all lines that correspond to a regular expression. Denote this by L (R). If there is a line that matches the two regular expressions R1 and R2, then this line belongs to both L (R1) and L (R2), so our question is equivalent

For two regular expressions R1 and R2, there is a string w in L (R1) & cap; L (R 2)?

So far, all we have done is to reconsider the problem that we want to solve. Now release the solution.

The key point here is the ability to convert any regular expression into an NFA (non-deterministic finite state machine), so that every line matching the regular expression is accepted by the NFA and vice versa. Even better, the resulting NFA can be built in polynomial time. So let's start by building an NFA for each input regular expression.

Now that we have these NFAs, we want to answer this question: is there a line that both NFAs accept? And, fortunately, there is a quick way to answer this question. There is a general NFA construct called the product construct, which, given the two NFAs N1 and N2, creates a new NFA N 'that accepts all strings accepted by both N1 and N2, and no other strings. Again, this design runs in polynomial time.

Once we have N ', we basically finished! All we need to do is run a breadth or depth search in the N 'states to see if we find the receiving state. If so, great! This means that the string received by N 'means that there is a string accepted by both N1 and N2, which means that there is a string matched by both R1 and R2, so the answer to the original question is β€œyes!” And vice versa, if we cannot reach the receiving state, then the answer is "no, this is impossible."

I am sure that there is a way to do all this implicitly by doing some kind of implicit BFS on the N 'machine, without actually creating it, and this should be possible to do in something like O (n 2 ) time. If I have some more time, I will go on to this answer and talk about how to do it.

+5
source share

You can use a dynamic approach tailored for this subset of a Thompson NFA-style text expression that implements only . and * :

You can do this either using dynamic programming (here in Ruby):

 def is_match(s, p) return true if s==p len_s, len_p=s.length, p.length dp=Array.new(len_s+1) { |row| [false] * (len_p+1) } dp[0][0]=true (2..len_p).each { |j| dp[0][j]=dp[0][j-2] && p[j-1]=='*' } (1..len_s).each do |i| (1..len_p).each do |j| if p[j-1]=='*' a=dp[i][j - 2] b=[s[i - 1], '.'].include?(p[j-2]) c=dp[i - 1][j] dp[i][j]= a || (b && c) else a=dp[i - 1][j - 1] b=['.', s[i - 1]].include?(p[j - 1]) dp[i][j]=a && b end end end dp[len_s][len_p] end # 139 ms on Leetcode 

Or recursively:

 def is_match(s,p,memo={["",""]=>true}) if p=="" && s!="" then return false end if s=="" && p!="" then return p.scan(/.(.)/).uniq==[['*']] && p.length.even? end if memo[[s,p]]!=nil then return memo[[s,p]] end ch, exp, prev=s[-1],p[-1], p.length<2 ? 0 : p[-2] a=(exp=='*' && ( ([ch,'.'].include?(prev) && is_match(s[0...-1], p, memo) || is_match(s, p[0...-2], memo)))) b=([ch,'.'].include?(exp) && is_match(s[0...-1], p[0...-1], memo)) memo[[s,p]]=(a || b) end # 92 ms on Leetcode 

In each case:

  • The operational starting point in the line and pattern is at the second character looking for * , and matches one character, while s matches the character in p to *
  • Meta symbol . used as padding for the actual character. This allows any character in s match . in p
+2
source share

You can also solve this problem with backtracking, not very efficiently (because the coincidence of the same substrings can be recalculated many times, which can be improved by introducing a lookup table in which all non-matching pairs of strings are stored and calculation only happens when they cannot be found in the lookup table), but it seems to work (js, the algorithm assumes that a simple regular expression is valid, which means not starting with *, but two adjacent * [try it yourself] ):

 function canBeEmpty(s) { if (s.length % 2 == 1) return false; for (let i = 1; i < s.length; i += 2) if (s[i] != "*") return false; return true; } function match(a, b) { if (a.length == 0 || b.length == 0) return canBeEmpty(a) && canBeEmpty(b); let x = 0, y = 0; // process characters up to the next star while ((x + 1 == a.length || a[x + 1] != "*") && (y + 1 == b.length || b[y + 1] != "*")) { if (a[x] != b[y] && a[x] != "." && b[y] != ".") return false; x++; y++; if (x == a.length || y == b.length) return canBeEmpty(a.substr(x)) && canBeEmpty(b.substr(y)); } if (x + 1 < a.length && y + 1 < b.length && a[x + 1] == "*" && b[y + 1] == "*") // star coming in both strings return match(a.substr(x + 2), b.substr(y)) || // try skip in a match(a.substr(x), b.substr(y + 2)); // try skip in b else if (x + 1 < a.length && a[x + 1] == "*") // star coming in a, but not in b return match(a.substr(x + 2), b.substr(y)) || // try skip * in a ((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching match(a.substr(x), b.substr(y + 1))); // try skip char in b else // star coming in b, but not in a return match(a.substr(x), b.substr(y + 2)) || // try skip * in b ((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching match(a.substr(x + 1), b.substr(y))); // try skip char in a } 

For a little optimization, you can first normalize the lines:

 function normalize(s) { while (/([^*])\*\1([^*]|$)/.test(s) || /([^*])\*\1\*/.test(s)) { s = s.replace(/([^*])\*\1([^*]|$)/, "$1$1*$2"); // move stars right s = s.replace(/([^*])\*\1\*/, "$1*"); // reduce } return s; } // example: normalize("aa*aa*aa*bb*b*cc*cd*dd") => "aaaa*bb*ccc*ddd*" 

It is possible to further reduce the input signal: x*.* And .*x* can be replaced by .* , So to get the maximum reduction, you need to try to move as many stars as possible close to .* (So ​​moving some stars to the left may be better than moving right).

+1
source share

IIUC, you ask: "Can a regex pattern match another regex pattern?"

Yes it is possible. In particular,. matches "any character" which of course includes . and * . So, if you have a line like this:

 bdbaa.* 

How can you match this up? Well, you can do it like this:

 bdbaa.. 

Or like this:

 b.* 

Or how:

 .*ba*.* 
-2
source share

All Articles