Regular expression math test

I need to check the user who is given the String and verify that he is a valid Set, possibly a set containing internal sets. Examples:

1) {1, 2, 3, 4} = valid 2) {1, 2, {3, 4}, 5} = valid 3) 1, 2, 3, 4 = invalid (missing brackets) 4) {1, 2, {3, 4, 5} = invalid (missing inner bracket) 

This is the regex that I use (broken for readability):

 String elementSeparator = "(,\\s)?"; String validElement = "(\\{?[A-Za-z0-9]*\\}?" + elementSeparator + ")*"; String regex = "^\\{" + validElement + "\\}$"; 

Currently, it accepts sets with additional opening and closing brackets, but I need to accept it only if they are both, and not if there is no bracket in the inner set. In my current implementation, the 4th example is accepted as a valid set.

How can i do this?

+8
java regex
source share
3 answers

Here is some Java pseudo-code to approach this problem without using any heavy tools like ANTLR. The basic approach is to split the input into tokens consisting of

  • One open or closed parenthesis
  • Comma
  • Spaces
  • Identifier

Then you look at the markers, tracking the level of nesting. If, when you get to the end, the nesting level is not equal to zero, the input line has an unbalanced curly bracket.

 Pattern token = Pattern.compile("([{}]|,|[A-Aa-z0-9]+|\s+)"); int nesting = 0 Matcher m = token.matcher(inputString); while(m.find()) { if (m.group(1).equals("{") nesting++; else if (m.group(1).equals("}") { nesting--; if (nesting < 0) error - too many right braces } else .... } if (nesting != 0) log("incorrect nesting"); 

Once you have this structure, you can improve it to detect things like two commas in a line: set a flag when you see a comma, clear the flag when you see an identifier (but not a space). In the branch for the comma and closing bracket, you check the flag and give an error message, because the comma at this point is invalid. And so on, for any check you need.

Please note that my pseudo-code above is not a complete solution, just intended to give you a general approach. The complete solution will be somewhat more complicated, as it will have to deal with invalid characters, making the lexer (the part that breaks the string into tokens) more complicated.

+4
source share

Due to the use of suitable parentheses, a simple grammar of regular expressions will be insufficient. You will need to learn what is called Context Free Grammars. I recommend looking at ANTLR, but it will be a much more difficult decision than you thought was necessary.

+3
source share

An easy way would be to look for the last "{", then "}" that immediately follows. Then check that the text between them is valid (there should be a comma separated list). Then replace the entire line (from '{' to '}' with a dummy value, for example 0. Then repeat until you stay at 0 or you encounter an error.

0
source share

All Articles