Interest Ask. And phant0m's answer is very educational! (and should be used if you understand parsers).
If you want to do this using only a regular expression, the following solution correctly validates an arbitrary nested logical operator using JavaScript.
Rules / Assumptions:
- A valid statement consists only of numbers, brackets, spaces, the logical operator
AND and the logical operator OR . - An operator must contain at least two “tokens” separated by a logical operator, where each token is either a “number” or a “bracket”.
- The number marker is a numeric integer that has one or more decimal digits preceded by an optional sign (either
+ or - ). - A token with “brackets” represents two or more tokens separated by a logical operator enclosed in matching matching opening and closing parentheses.
- The application as a whole may contain more than two tokens, but all tokens must be divided by the same single operator; either
AND or OR . - Each unit in brackets may contain more than two tokens, but all tokens must be separated by the same single operator; either
AND or OR . - Any number of spaces can be used between any elements (parentheses, numbers and logical operators), but at least one space is required between numbers and a logical operator.
- The logical operators
AND and OR not case sensitive.
Examples of valid logical operators:
"1 AND 2" "1 AND 2 AND 3" "1 OR 2" "-10 AND -20" "100 AND +200 AND -300" "1 AND (2 OR 3)" "1 AND (2 OR 3) AND 4" "1 OR ((2 AND 3 AND 4) OR (5 OR 6 OR 7))" "( 1 and 2 ) AND (1 AND 2)"
Examples of invalid logical operators:
"1x"
Regular Expression Solution:
This problem requires matching nested constructions in parentheses, and the JavaScript regex mechanism does not support recursive expressions, so this problem cannot be solved in one hit using a single regular expression. However, the problem can be simplified into two parts, each of which can be solved using a single JavaScript expression. The first regular expression matches the internal brackets, and the second checks the simplified logical operator (which does not have parentheses).
Regex # 1: match the innermost bracket.
The following regular expression corresponds to one unit in brackets, which consists of two or more tokens of a number, where all numbers are separated by either AND or OR , with at least one space between numbers and logical operators. The regular expression is fully commented and formatted for readability in the syntax of the free space mode PHP:
$re_paren = '/ # Match innermost "parenthesized unit". \( # Start of innermost paren group. \s* # Optional whitespace. [+-]?\d+ # First number token (required). (?: # ANDs or ORs (required). (?: # Either multiple AND separated values. \s+ # Required whitespace. AND # Logical operator. \s+ # Required whitespace. [+-]?\d+ # Additional number. )+ # multiple AND separated values. | (?: # Or multiple OR separated values. \s+ # Required whitespace. OR # Logical operator. \s+ # Required whitespace. [+-]?\d+ # Additional number token. )+ # multiple OR separated values. ) # ANDs or ORs (required). \s* # Optional whitespace. \) # End of innermost paren group. /ix';
Regex # 2: checking a simplified logical operator.
Here (almost identical, with the exception of boundary anchors) is a regular expression that checks a simplified logical operator (having only numbers and logical operators and without parentheses). Here it is in the commented-out syntax of free space mode (PHP):
$re_valid = '/ # Validate simple logical statement (no parens). ^ # Anchor to start of string. \s* # Optional whitespace. [+-]?\d+ # First number token (required). (?: # ANDs or ORs (required). (?: # Either multiple AND separated values. \s+ # Required whitespace. AND # Logical operator. \s+ # Required whitespace. [+-]?\d+ # Additional number. )+ # multiple AND separated values. | (?: # Or multiple OR separated values. \s+ # Required whitespace. OR # Logical operator. \s+ # Required whitespace. [+-]?\d+ # Additional number token. )+ # multiple OR separated values. ) # ANDs or ORs (required). \s* # Optional whitespace. $ # Anchor to end of string. /ix';
Note that these two regular expressions are identical, with the exception of the boundary anchors.
JavaScript solution:
The tested JavaScript function below uses two of the above expressions to solve the problem:
function isValidLogicalStatement(text) { var re_paren = /\(\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*\)/ig; var re_valid = /^\s*[+-]?\d+(?:(?:\s+AND\s+[+-]?\d+)+|(?:\s+OR\s+[+-]?\d+)+)\s*$/ig;
The function uses an iterative technique for the first match and replacement of the inner blocks in brackets, replacing them with a single token of a number, and then we check to see if the resulting statement is really (without parentheses).
Application: 2012-11-06
In a comment on this answer, the OP now says that there must be spaces between numbers and operators, and the number or brackets may NOT stand on their own. Given these additional requirements, I updated the answer above.