Library to convert node tree

Question

Library to convert node tree

I would like to be able to express the general transformation of one tree to another without writing a bunch of repeating spaghetti code. Are there any libraries to help deal with this problem? My target language is Python, but I will look at other languages if possible for a port in Python.

Example: I would like to convert this tree to node: (please excuse S expressions )

(A (B) (C) (D))

In that:

 (C (B) (D))

So far, the parent is A and the second ancestor is C, regardless of context (there may be more parents or ancestors). I would like to express this transformation in a simple, concise and reusable manner. Of course, this example is very specific. Please try to turn to a common cause.

Edit: RefactoringNG is what I am looking for, although it introduces a completely new grammar to solve a problem that I would like to avoid. I am still looking for more and / or better examples.

Background:

I can convert python and cheetah files (don't ask!) To tokenized tree views and, in turn, convert them to lxml trees. Then I plan to reorganize the tree and write out the results for the implementation of automated refactoring. XSLT seems to be the standard tool for rewriting XML, but the syntax is terrible (in my opinion, obviously) and no one in our store will understand it.

I could write some functions that just use lxml methods (.xpath, etc.) to implement my refactoring, but I worry that I am completing a bunch of specially crafted spaghetti code that cannot be re-contained.

+8

python refactoring tree expression-trees nodes

bukzor Jan 18 '12 at 21:28

source share

2 answers

Marcin · Answer 1 · 2013-07-03T14:41:59+0000

Try this in Python code. I used strings for leaves, but this will work with any objects.

 def lift_middle_child(in_tree): (A, (B,), (C,), (D,)) = in_tree return (C, (B,), (D,)) print lift_middle_child(('A', ('B',), ('C',), ('D',))) # could use lists too

Such a tree transformation is usually better performed in a functional style - if you create a bunch of these functions, you can explicitly compose them or create a composition function to work with them in a dot-free style.

Since you used s-expressions, I assume that it is convenient for you to represent trees as nested lists (or the equivalent - if I'm not mistaken, lxml nodes are iterable in this way). Obviously this example is based on a known input structure, but your question implies this. You can write more flexible functions and still compose them if they have this uniform interface.

Here's the code in action: http://ideone.com/02Uv0i

Now, here is the function for looking back at children and using this function and above, one for lifting and reversing:

 def compose2(a,b): # might want to get this from the functional library return lambda *x: a(b(*x)) def compose(*funcs): #compose(a,b,c) = a(b(c(x))) - you might want to reverse that return reduce(compose2,funcs) def reverse_children(in_tree): return in_tree[0:1] + in_tree[1:][::-1] # slightly cryptic, but works for anything subscriptable lift_and_reverse = compose(reverse_children,lift_middle_child) # right most function applied first - if you find this confusing, reverse order in compose function. print lift_and_reverse(('A', ('B',), ('C',), ('D',)))

Ira Baxter · Answer 2 · 2012-01-18T23:55:58+0000

What do you really want, IMHO is

You will find that even if you can get a view of the XML tree of Python, trying to write an XSLT / XPath transform is more than you expect; the trees representing the actual code are more messy than you expected, XSLT is not so convenient for designation, and it cannot express directly the general conditions on the trees that you want to check (for example, that the two subtrees are the same). Last complication with XML: suppose it has been converted. How do you restore the syntax of the source code from which you came from? You need some pretty printer.

A common problem, regardless of how the code is presented, is that without information about areas and types (where you can get it), the correct conversion is quite difficult. In the end, if you are going to convert python to a language that uses different operators for concat strings and arithmetic (unlike Java, which uses "+" for both), you should be able to decide which operator to generate. Therefore, you need type information. Python may not make any difference, but in practice most expressions include variables that are of only one type for their entire life cycle. Thus, you also need flow analysis to calculate types.

Ours possesses all these features (parsing, stream analysis, pattern matching / rewriting, simpling) for many languages, including Python. (Although it does have stream analysis capabilities created for C, COBOL, Java, it is not created for Python. But then you said you want to do the conversion regardless of context).

To express your correspondence in DMS on Python syntax next to your example (which is not Python?)

  domain Python; rule revise_arguments(f:IDENTIFIER,A:expression,B:expression, C:expression,D:expression):primary->primary = " \f(\A,(\B),(\C),(\D)) " -> " \f(\C,(\B),(\D)) ";

The notation above is DMS Rule Rewriting Language (RSL). "..." are meta-libraries that share Python syntax (inside these quotes, DMS knows that it is Python because of the domain notation declaration) from the RSL DMS language. \ N inside the meta-quote refers to syntactic variable variables of the named nonterminal types defined in the rule parameter list. Yes, (...) inside the metaotes there is Python () ... they exist in the syntax trees with respect to the DMS, because they, like the rest of the language, are just syntax.

The above rule looks a little strange, because I try to follow your example as close as possible, and from the point of view of the language of expression and expression, your example is odd precisely because it has unusual parentheses.

Using this rule, DMS can parse Python (using its Python parser), for example

  foobar(2+3,(xy),(p),(baz()))

build an AST, map the rule (parsed-to-AST) to that AST, rewrite it to another AST corresponding to:

  foobar(p,(xy),(baz()))

and then type in the original (valid) python syntax.

If you assumed that your example is a LISP code conversion, you would need a LISP grammar for DMS (it’s not difficult to build, but we have little to call it) and write the appropriate surface syntax:

  domain Lisp; rule revise_form(A:form,B:form, C:form, D:form):form->form = " (\A,(\B),(\C),(\D)) " -> " (\C,(\B),(\D)) ";

You can better understand this by looking.

If your goal is to implement all this in Python ... I don't have much help. DMS is a fairly large system, and there would be a lot of effort to replicate.

Library to convert node tree

More articles: