Parse XPath Expressions

I am trying to create an "AET" (abstract expression tree) for XPath (since I am writing a WYSIWYG XSL editor). I hit my head against the wall with XPath BNF for the last three to four hours.

I thought of a different solution. I thought I could write a class that implements IXPathNavigable, which returns its own XPathNavigator when calling CreateNavigator. This XPathNavigator will always succeed in any method calls and will track these calls - for example. we went to node clients and then to node client. Then I could use this information (hopefully) to create an "AET" (so we will now have clients / clients in the object model).

Just a question: how can I run IXPathNavigable through XPathExpression?

I know this is too lazy. But did anyone else go through this effort and write an XPath expression parser? I don't yet have POC'd of my possible solution because I cannot test it (because I cannot run XPathExpression against IXPathNavigable), so I don’t even know if my solution will work.

+6
c # xpath xpathnavigator
source share
2 answers

There is antlr xpath grammar here . Since the license is allowed, I copied all the grammar here to avoid rotting the links in the future.

grammar xpath; /* XPath 1.0 grammar. Should conform to the official spec at http://www.w3.org/TR/1999/REC-xpath-19991116. The grammar rules have been kept as close as possible to those in the spec, but some adjustmewnts were unavoidable. These were mainly removing left recursion (spec seems to be based on LR), and to deal with the double nature of the '*' token (node wildcard and multiplication operator). See also section 3.7 in the spec. These rule changes should make no difference to the strings accepted by the grammar. Written by Jan-Willem van den Broek Version 1.0 Do with this code as you will. */ /* Ported to Antlr4 by Tom Everett < tom@khubla.com > */ main : expr ; locationPath : relativeLocationPath | absoluteLocationPathNoroot ; absoluteLocationPathNoroot : '/' relativeLocationPath | '//' relativeLocationPath ; relativeLocationPath : step (('/'|'//') step)* ; step : axisSpecifier nodeTest predicate* | abbreviatedStep ; axisSpecifier : AxisName '::' | '@'? ; nodeTest: nameTest | NodeType '(' ')' | 'processing-instruction' '(' Literal ')' ; predicate : '[' expr ']' ; abbreviatedStep : '.' | '..' ; expr : orExpr ; primaryExpr : variableReference | '(' expr ')' | Literal | Number | functionCall ; functionCall : functionName '(' ( expr ( ',' expr )* )? ')' ; unionExprNoRoot : pathExprNoRoot ('|' unionExprNoRoot)? | '/' '|' unionExprNoRoot ; pathExprNoRoot : locationPath | filterExpr (('/'|'//') relativeLocationPath)? ; filterExpr : primaryExpr predicate* ; orExpr : andExpr ('or' andExpr)* ; andExpr : equalityExpr ('and' equalityExpr)* ; equalityExpr : relationalExpr (('='|'!=') relationalExpr)* ; relationalExpr : additiveExpr (('<'|'>'|'<='|'>=') additiveExpr)* ; additiveExpr : multiplicativeExpr (('+'|'-') multiplicativeExpr)* ; multiplicativeExpr : unaryExprNoRoot (('*'|'div'|'mod') multiplicativeExpr)? | '/' (('div'|'mod') multiplicativeExpr)? ; unaryExprNoRoot : '-'* unionExprNoRoot ; qName : nCName (':' nCName)? ; functionName : qName // Does not match nodeType, as per spec. ; variableReference : '$' qName ; nameTest: '*' | nCName ':' '*' | qName ; nCName : NCName | AxisName ; NodeType: 'comment' | 'text' | 'processing-instruction' | 'node' ; Number : Digits ('.' Digits?)? | '.' Digits ; fragment Digits : ('0'..'9')+ ; AxisName: 'ancestor' | 'ancestor-or-self' | 'attribute' | 'child' | 'descendant' | 'descendant-or-self' | 'following' | 'following-sibling' | 'namespace' | 'parent' | 'preceding' | 'preceding-sibling' | 'self' ; PATHSEP :'/'; ABRPATH : '//'; LPAR : '('; RPAR : ')'; LBRAC : '['; RBRAC : ']'; MINUS : '-'; PLUS : '+'; DOT : '.'; MUL : '*'; DOTDOT : '..'; AT : '@'; COMMA : ','; PIPE : '|'; LESS : '<'; MORE_ : '>'; LE : '<='; GE : '>='; COLON : ':'; CC : '::'; APOS : '\''; QUOT : '\"'; Literal : '"' ~'"'* '"' | '\'' ~'\''* '\'' ; Whitespace : (' '|'\t'|'\n'|'\r')+ ->skip ; NCName : NCNameStartChar NCNameChar* ; fragment NCNameStartChar : 'A'..'Z' | '_' | 'a'..'z' | '\u00C0'..'\u00D6' | '\u00D8'..'\u00F6' | '\u00F8'..'\u02FF' | '\u0370'..'\u037D' | '\u037F'..'\u1FFF' | '\u200C'..'\u200D' | '\u2070'..'\u218F' | '\u2C00'..'\u2FEF' | '\u3001'..'\uD7FF' | '\uF900'..'\uFDCF' | '\uFDF0'..'\uFFFD' // Unfortunately, java escapes can't handle this conveniently, // as they're limited to 4 hex digits. TODO. // | '\U010000'..'\U0EFFFF' ; fragment NCNameChar : NCNameStartChar | '-' | '.' | '0'..'9' | '\u00B7' | '\u0300'..'\u036F' | '\u203F'..'\u2040' ; 
+2
source share

I wrote as an XPath parser and implementation of IXPathNavigable (I used to be an XMLPrime developer). Nothing is easy; and I suspect that IXPathNavigable will not be the cheap victory you are hoping for, as there is a lot of subtlety in the interactions between the various methods - I suspect that a full XPath analyzer will be simpler (and more reliable).

To answer your question:

 var results xpathNavigable.CreateNavigator().Evaluate("/my/xpath[expression]"). 

You will probably need to list the results to force node navigation.

If you always returned true, then all you know about the next XPath is that it is looking for foo child bars: foo[not(bar)]/other/elements

If you always return a fixed number of nodes, then you will never know about most of this XPath a[100]/b/c/

Essentially, this will not work.

+2
source share

All Articles