Tips for parsing string expressions?

I was bored during the holiday season this year and accidentally decided to write a simple library for understanding / filtering libraries for Java (I know that there are excellent ones, I just wanted to write this myself, damn it).

For this list:

LinkedList<Person> list = new LinkedList<Person>(); list.add(new Person("Jack", 20)); list.add(new Person("Liz", 58)); list.add(new Person("Bob", 33)); 

Syntax:

 Iterable<Person> filtered = Query.from(list).where( Condition.ensure("Age", Op.GreaterEqual, 21) .and(Condition.ensure("Age", Op.LessEqual, 50)); 

I know its ugly, but if I use static imports and use shorter method names, it gets pretty short.

The following syntax is the final goal:

 Iterable<Person> list2 = Query.from(list).where("x=> x.Age >= 21 & x.Age <= 50"); 

Obviously, parsing expressions is not my strongest area, I am having problems parsing nested / multiple conventions. Does anyone know of some resources / literature that might prove useful?

I have only individual conditional expressions that are successfully processed from the syntax of the lambda string at the moment: "x=> x.Name == Jack" . My basic Expression structure is pretty solid and can easily handle any amount of nesting, the problem is just parsing the expression from the string.

thanks

Just for beats, here is a little insight into how the structure of an expression behind the scenes can work (obviously, I could specify "op.GreaterEqual", etc ... in the following example, but I wanted to demonstrate how flexible it is for any amount of nesting):

 Condition minAge1 = Condition.ensure("Age", Op.Equal, 20); Condition minAge2 = Condition.ensure("Age", Op.Greater, 20); Expression minAge = new Expression(minAge1, Express.Or, minAge2); Expression maxAge = Condition.ensure("Age", Op.Equal, 50).or(Condition.ensure("Age", Op.Less, 50)); Expression ageExpression = new Expression(minAge, Express.And, maxAge); Condition randomException = Condition.ensure("Name", Op.Equal, "Liz"); Expression expressionFinal = new Expression(ageExpression, Express.Or, randomException); 
+6
java lambda parsing list-comprehension
source share
3 answers

Basically, you will need a recursive descent parser for expressions. This is a topic widely used in compiler theory, so any book on compilers will cover this topic. In formal grammar terms, it will look something like this:

 condition : orAtom ('||' orAtom)+ ; orAtom : atom ('&&' atom)+ ; atom : '(' condition ')' | expression ; expression : value OPER value ; value : VARIABLE | LITERAL ' VARIABLE : (LETTER | '_') (LETTER | DIGIT | '_')* ; LITERAL : NUMBER | STRING ; NUMBER : '-'? DIGIT+ ('.' DIGIT+)? ; STRING : '"' . CHAR* . '"' ' CHAR : ('\\' | '\"' | .) + ; LETTER : 'a'..'z' | 'A'..'Z' ; DIGIT : '0'..'9' ; OPER : '>' | '>=' | '<' | '<=' | '=' | '!=' ; 

The grammar above is (mostly) in the form of ANTLR as the one with which I am most familiar.

Parsing logical or arithmetic expressions is a classic introductory topic when you're parsing so you can find a lot of literature. If you want to chase ANTLR (ever since you use Java), I highly recommend reading The Definitive ANTLR Reference: Creating Domain Languages .

If all this looks redundant and all that is needed, you may be right. This is a difficult topic to get started.

You have an alternative not to create an arbitrary string expression, but instead use a free interface (for example, you do):

 List results = from(source) .where(var("x").greaterThan(25), var("x").lessThan(50)) .select("field1", "field2"); 

as this indicates an expression tree in the code and should be easier to implement.

+5
source share

To add cletus to the answer, you first want to define your grammar.

The following expressive grammar works very well for most cases, unfortunately, normal recursive descent does not allow you to define the recursive part first in each piece. This will cause you to invoke the production method recursively until you get a stack overflow.

  orexpr :: = orexpr '|'  andexpr  
                   |  andexpr  

         andexpr :: = andexpr '&' comparison
                    |  comparison

         comparison :: = addexpr compareOp addexpr
                      |  addexpr

         addexpr :: = addexpr '+' mulexpr
                    |  addexpr '-' mulexpr
                    |  mulexpr

         mulexpr :: = mulexpr '*' value
                    |  mulexpr '/' value
                    |  mulexpr '%' value
                    |  value

         value :: = integer
                    |  float
                    |  variable
                    |  quotation
                    |  '(' orexpr ')'

Normal recursive descent will require you to define mulexpr, for example:

  mulexpr :: = value '*' mulexpr  
                     |  value '/' mulexpr
                     |  value '%' mulexpr

But the problem with this grammar is that the expression tree will be built so that your order of operations is reversed.

Compromise: use the recursive descent in the opposite direction on the original grammar written above. Parse the expression from right to left. Build your tree from right to left. It will keep the order of operations.

In recursive descents, you usually write an analysis method for each product. The parseOr () method may look like this:

  private MyExpression parseOr (MyScanner scanner) {
         MyExpression expression = null;

         MyExpression rightExpr = parseAnd (scanner);
         Token token = scanner.getCurrentToken ();
         if (token.hasValue ("|") {
             expression = new MyExpression ();
             expression.setOperator (OR);
             Token nextToken = scanner.getNextToken ();  // remember, this is scanning in reverse
             MyExpression leftExpression = parseOr (scanner);
             expression.setLeft (leftExpression);
             expression.setRight (rightExpression);
         }
         else {
             expression = rightExpression;
         }
         return expression;
     } 

+1
source share

Thanks for all the guys advice. I decided that most of them were much bigger than I needed, so I ended up putting the hell out of everything to gain access to managed groups that I could easily parse in 20-30 lines of code.

Ive got the LambdaExpression interface, which works almost as well as the free interface, with only one or two small errors.

I will probably continue to develop it a little just for fun, but it is clearly too inefficient to really use it with a 90% reflection.

+1
source share

All Articles