How to convert source code to XML representation of ast object?

I want to get an xml representation of java hell and c code. 3 months ago, I asked this question yet , but the solutions were not convenient for me

  • srcml seems to be a good solution for this problem, but it does not support row numbers and columns, but I need this function.
  • about elsa : cite: "Currently, efforts are underway to export Elsa AST as an XML document, we expect you will be able to advertise this in the next public release."
  • dms ... did not understand this.
  • especially for java, there is javaml that supports line numbers. but the sourceforge page does not contain any files.

question: is there software that supports ast to xml conversion that supports row numbers (and columns) [especially for java and c / C ++]? is there an alternative to javaml and srcml?

ps: I do not have parser generators. I hope to find a tool that can be used on the console by typing: ./ my-xml-generator Test.java [or something like that] ... or the Java implementation will be great.

+5
source share
5 answers

bit late, but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

I have implemented it myself and will convert PHP and Java to XML. It's free, so enjoy!

Oana.

+3
source

What did you not understand about DMS ?

He exists.

It has an accurate parser / interface compiler for C, C ++, Java, C #, COBOL (and many other languages) .

It automatically creates complete abstract syntax trees for what it parses. Each AST node has a file / line / column stamp for the token that represents the beginning of this node, and the last column can be calculated by calling the DMS API.

It has a built-in option for generating XML from AST, complete with node type, source position (as above) and any associated literal value. Command line call:

run DMSDomainParser ++XML <path_to_your_file> 

You can see what this XML result for Java looks like .

You probably don't really want what you want. A 1000 C program can contain 100K lines of #include files. A line creates between 5-10 nodes. The DMS XML output is succint, and each node only accepts a string, so you look through ~~ 1 million XML lines, 60 characters each - 60 million characters. This is a large file, and you probably do not want to process it with an XML tool.

DMS itself provides a huge infrastructure for managing the created AST: intersection, pattern matching (based on patterns encoded essentially in the original form), source-to-source conversion, control flow, data flow, point analysis, global call schedules. It is surprisingly difficult to reproduce all this technique, and you will probably need to do something interesting.

Morality: It is much better to use something like DMS to directly manage AST than to combat XML.

Full disclosure: I am an architect behind DMS.

+2
source

There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I actually did not use it myself.

+1
source

For Java only, you can use BeautyJ .

You can run it against your file with -xml. * options. For example:

 java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java 

... and you get the XML representation of this file (and included it).

BTW: the options "-xml.out =" specify the output file. Used in this way, with the final "=", it is displayed on STDOUT. It's not a mistake.

0
source

srcml supports row number and column number . Here is an example of using a java file called input.java (remember that srcml supports several languages, including C / C ++), which contains the following:

 public class HelloWorld { public static void main(String[] args) { // Prints "Hello, World" to the terminal window. System.out.println("Hello, World"); } } 

Then run srcml with the command to enable tracking of this additional location information:

 srcml input.java --position 

It produces the following AST in XML format with row number and column number:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{ <function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{ <comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment> <expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt> }<pos:position pos:line="5" pos:column="6"/></block></function> }<pos:position pos:line="6" pos:column="2"/></block></class></unit> 

Link: Documentation for srcml v0.9.5 (see srcml --help). I also often use srcml, including this function, to get location information.

0
source

All Articles