Matching occurrence and character pattern of String2 in String1

Question

Matching occurrence and character pattern of String2 in String1

I was asked this question in a telephone interview for a summer internship, and I tried to come up with a solution to the complexity n * m (although this was also inaccurate) in Java.

I have a function that takes 2 lines, suppose "common" and "cmn". It should return True, based on the fact that "c", "m", "n" occur in the same order in "general". But if the arguments were “general” and “omn”, they will return False, because although they occur in the same order, “m” also appears after “o” (which does not allow the pattern matching condition to be fulfilled)

I worked on this with the Hashmaps and Ascii arrays, but have not yet received a convincing solution! From what I have read so far, can this be related to Boyer-Moore or Levenshtein algorithms?

Waiting for a break in stackoverflow! :)

Change Some answers talk about shortening a word or creating a hash. But, in my understanding, this question cannot be done with hashesets, because the appearance / repetition of each character in the first line has its own meaning. PASS - "con", "cmn", "cm", "cn", "mn", "on", "co". FAIL, which may seem otherwise - "com", "omn", "mon", "om". This is FALSE / FAIL because "o" occurs earlier, as well as after "m". Another example: "google", "ole" will be PASS, but "google", "gol" will fail because "o" also appears before "g"!

+8

java string algorithm

Madtest May 03 '11 at 2:34 a.m.

source share

8 answers

Great question and a few hours of research, and I think I found a solution. First of all, let me try to explain the question in a different way.

Requirements:

Let's look at the same example of "common" (mainString) and "cmn" (subString). First, we need to be clear that any characters can be repeated inside mainString, as well as substrings, and since we are concentrating in our template, the index of the character plays a big role. Therefore, we need to know:

Character Index (smallest and highest)

Lets keep it on hold and go ahead and check out the templates a bit more. For a common word, we need to find out if a particular cmn template is present or not. Different options are possible: - (Priority applies)

c → o
c → m
c → n
o → m
o → o
o → n
m → m
m → o
m → n
o → n

At any given time, this priority and comparison must be valid. Since priority plays a huge role, we need to have the index of each unique character instead of storing different patterns.

Decision

The first part of the solution is to create a hash table with the following criteria: -

Create a hash table with a key like every mainString character
Each record for a unique key in the hash table stores two indexes: i.e. lowerIndex and higherIndex
Go through mainString and for each new character update the new lowerIndex entry in Hash with the current character index in mainString.
If a collision occurs, update the current index with a record with a higher value, do so to the end of the line

The second and main part of matching the pattern: -

Set flag as false
Loop through a substring and for each character as a key, retreive details from a hash.
Do the same for the next character.

Just before the loop increment, check two conditions

 If highestIndex(current character) > highestIndex(next character) Then Pattern Fails, Flag <- False, Terminate Loop // This condition is applicable for almost all the cases for pattern matching Else If lowestIndex(current character) > lowestIndex(next character) Then Pattern Fails, Flag <- False, Terminate Loop // This case is explicitly for cases in which patterns like 'mon' appear

Display flag

NB: Since I am not so versatile in Java, I have not presented the code. But someone can try to realize my idea

+2

Nirmalgeo May 03 '11 at 9:37

source share

I myself made this question inefficient, but it gives an accurate result! I would appreciate if anyone could parse the efficient code / algorithm from this!

Create a Check function that takes 2 lines as arguments. Check each character of line 2 in line 1. The order in which each character s2 appears should be checked as true in S1.

Take character 0 from string p and go through string s to find its index of the first occurrence.
Go through the filled ascii array to find any value greater than the index of the first occurrence.
Go ahead to find the last occurrence and refresh the ascii array
Take character 1 from string p and go through string s to find the index of the first occurrence in string s
Go through the filled ascii array to find any value greater than the index of the first occurrence. if found, return False.
Go ahead to find the last occurrence and refresh the ascii array

As you can see, this is brute force ... I think O (N ^ 3)

 public class Interview { public static void main(String[] args) { if (check("google", "oge")) System.out.println("yes"); else System.out.println("sorry!"); } public static boolean check (String s, String p) { int[] asciiArr = new int[256]; for(int pIndex=0; pIndex<p.length(); pIndex++) //Loop1 inside p { for(int sIndex=0; sIndex<s.length(); sIndex++) //Loop2 inside s { if(p.charAt(pIndex) == s.charAt(sIndex)) { asciiArr[s.charAt(sIndex)] = sIndex; //adding char from s to its Ascii value for(int ascIndex=0; ascIndex<256; ) //Loop 3 for Ascii Array { if(asciiArr[ascIndex]>sIndex) //condition to check repetition return false; else ascIndex++; } } } } return true; } }

+1

Madtest May 03 '11 at 15:38

source share

Is this not possible in O (n log n)?

Step 1, reduce the string to exclude all characters that appear on the right. Strictly speaking, you only need to delete characters if they appear in the line you are checking.

 /** Reduces the maximal subsequence of characters in container that contains no * character from container that appears to the left of the same character in * container. Eg "common" -> "cmon", and "whirlygig" -> "whrlyig". */ static String reduceContainer(String container) { SparseVector charsToRight = new SparseVector(); // Like a Bitfield but sparse. StringBuilder reduced = new StringBuilder(); for (int i = container.length(); --i >= 0;) { char ch = container.charAt(i); if (charsToRight.add(ch)) { reduced.append(ch); } } return reduced.reverse().toString(); }

Step 2, check the containment.

 static boolean containsInOrder(String container, String containee) { int containerIdx = 0, containeeIdx = 0; int containerLen = container.length(), containeeLen == containee.length(); while (containerIdx < containerLen && containeeIdx < containeeLen) { // Could loop over codepoints instead of code-units, but you get the point... if (container.charAt(containerIdx) == containee.charAt(containeeIdx)) { ++containeeIdx; } ++containerIdx; } return containeeIdx == containeeLen; }

And to answer your second question, no, the Levenshtein distance will not help you, since it has the property that if you change the arguments, the result will be the same, but the algorithm you need will not.

0

Mike samuel May 03 '11 at 2:55

source share

 public class StringPattern { public static void main(String[] args) { String inputContainer = "common"; String inputContainees[] = { "cmn", "omn" }; for (String containee : inputContainees) System.out.println(inputContainer + " " + containee + " " + containsCommonCharsInOrder(inputContainer, containee)); } static boolean containsCommonCharsInOrder(String container, String containee) { Set<Character> containerSet = new LinkedHashSet<Character>() { // To rearrange the order @Override public boolean add(Character arg0) { if (this.contains(arg0)) this.remove(arg0); return super.add(arg0); } }; addAllPrimitiveCharsToSet(containerSet, container.toCharArray()); Set<Character> containeeSet = new LinkedHashSet<Character>(); addAllPrimitiveCharsToSet(containeeSet, containee.toCharArray()); // retains the common chars in order containerSet.retainAll(containeeSet); return containerSet.toString().equals(containeeSet.toString()); } static void addAllPrimitiveCharsToSet(Set<Character> set, char[] arr) { for (char ch : arr) set.add(ch); } }

Output:

 common cmn true common omn false

0

Emil May 03 '11 at 5:51

source share

I would see this as one of the worst code snippets I've ever written, or one of the worst code examples in stackoverflow ... but guess what ... all your conditions are met!
No algorithm can really fit the needs, so I just used bruteforce ... check it out ...
And I could just pay less attention to space and time complexity ... my goal was the first to try and solve it ... and maybe improve it later!

 public class SubString { public static void main(String[] args) { SubString ss = new SubString(); String[] trueconditions = {"con", "cmn", "cm", "cn", "mn", "on", "co" }; String[] falseconditions = {"com", "omn", "mon", "om"}; System.out.println("True Conditions : "); for (String str : trueconditions) { System.out.println("SubString? : " + str + " : " + ss.test("common", str)); } System.out.println("False Conditions : "); for (String str : falseconditions) { System.out.println("SubString? : " + str + " : " + ss.test("common", str)); } System.out.println("SubString? : ole : " + ss.test("google", "ole")); System.out.println("SubString? : gol : " + ss.test("google", "gol")); } public boolean test(String original, String match) { char[] original_array = original.toCharArray(); char[] match_array = match.toCharArray(); int[] value = new int[match_array.length]; int index = 0; for (int i = 0; i < match_array.length; i++) { for (int j = index; j < original_array.length; j++) { if (original_array[j] != original_array[j == 0 ? j : j-1] && contains(match.substring(0, i), original_array[j])) { value[i] = 2; } else { if (match_array[i] == original_array[j]) { if (value[i] == 0) { if (contains(original.substring(0, j == 0 ? j : j-1), match_array[i])) { value[i] = 2; } else { value[i] = 1; } } index = j + 1; } } } } for (int b : value) { if (b != 1) { return false; } } return true; } public boolean contains(String subStr, char ch) { for (char c : subStr.toCharArray()) { if (ch == c) { return true; } } return false; } }

-IvarD

0

topgun_ivard May 03 '11 at 9:39

source share

I think this test is not a test of your fundamentals in computer science, more than what you could do in the Java programming environment.

You can build a regular expression from the second argument, i.e.

 omn -> o.*m[^o]*n

... and then check the candidate string against it using String.matches (...) or using Pattern .

In general, the construction of RegExp should be in the following lines.

exp → in [0] . * + for each x: 2 → in.lenght {( in [x-1] + [^ in [x-2]] * + in [x] )}

eg:

demmn → d. * e [^ d] * m [^ e] * m [^ m] * n

0

Adrian regan May 04 '11 at 14:28

source share

I tried it myself differently. Just share your decision.

public class PatternMatch {

 public static boolean matchPattern(String str, String pat) { int slen = str.length(); int plen = pat.length(); int prevInd = -1, curInd; int count = 0; for (int i = 0; i < slen; i++) { curInd = pat.indexOf(str.charAt(i)); if (curInd != -1) { if(prevInd == curInd) continue; else if(curInd == (prevInd+1)) count++; else if(curInd == 0) count = 1; else count = 0; prevInd = curInd; } if(count == plen) return true; } return false; } public static void main(String[] args) { boolean r = matchPattern("common", "on"); System.out.println(r); }

}

0

FourOfAKind May 16 '11 at 6:49

source share

raymi · Accepted Answer · 2011-05-03T15:50:07+0000

I think it's pretty simple. Run the template and each character will receive the index of the last occurrence in the string. The index should always increase, otherwise false. So in the pseudo code:

index = -1 foreach c in pattern checkindex = string.lastIndexOf(c) if checkindex == -1 //not found return false if checkindex < index return false if string.firstIndexOf(c) < index //characters in the wrong order return false index = checkindex return true

Edit:, you can improve the code by passing index as the starting index to lastIndexOf . Then you need to compare checkindex with index , and the algorithm will be faster.

Update: Fixed a bug in the algorithm. Added an additional condition for considering the order of letters in the template.

Matching occurrence and character pattern of String2 in String1

More articles: