How to convert source code to XML representation of ast object?

Question

How to convert source code to XML representation of ast object?

I want to get an xml representation of java hell and c code. 3 months ago, I asked this question yet , but the solutions were not convenient for me

srcml seems to be a good solution for this problem, but it does not support row numbers and columns, but I need this function.
about elsa : cite: "Currently, efforts are underway to export Elsa AST as an XML document, we expect you will be able to advertise this in the next public release."
dms ... did not understand this.
especially for java, there is javaml that supports line numbers. but the sourceforge page does not contain any files.

question: is there software that supports ast to xml conversion that supports row numbers (and columns) [especially for java and c / C ++]? is there an alternative to javaml and srcml?

ps: I do not have parser generators. I hope to find a tool that can be used on the console by typing: ./ my-xml-generator Test.java [or something like that] ... or the Java implementation will be great.

+5

xml abstract-syntax-tree code-conversion

autobiographer May 12, '10 at 10:26

source share

5 answers

Veni_vidi_vici · Answer 1 · 2012-08-13T02:38:45+0000

bit late, but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

I have implemented it myself and will convert PHP and Java to XML. It's free, so enjoy!

Oana.

Ira Baxter · Answer 2 · 2010-05-14T01:36:47+0000

What did you not understand about DMS ?

He exists.

It has an accurate parser / interface compiler for C, C ++, Java, C #, COBOL (and many other languages) .

It automatically creates complete abstract syntax trees for what it parses. Each AST node has a file / line / column stamp for the token that represents the beginning of this node, and the last column can be calculated by calling the DMS API.

It has a built-in option for generating XML from AST, complete with node type, source position (as above) and any associated literal value. Command line call:

run DMSDomainParser ++XML <path_to_your_file>

You can see what this XML result for Java looks like .

You probably don't really want what you want. A 1000 C program can contain 100K lines of #include files. A line creates between 5-10 nodes. The DMS XML output is succint, and each node only accepts a string, so you look through ~~ 1 million XML lines, 60 characters each - 60 million characters. This is a large file, and you probably do not want to process it with an XML tool.

DMS itself provides a huge infrastructure for managing the created AST: intersection, pattern matching (based on patterns encoded essentially in the original form), source-to-source conversion, control flow, data flow, point analysis, global call schedules. It is surprisingly difficult to reproduce all this technique, and you will probably need to do something interesting.

Morality: It is much better to use something like DMS to directly manage AST than to combat XML.

Full disclosure: I am an architect behind DMS.

anon · Answer 3 · 2010-05-12T10:28:35+0000

There is GCC-XML at http://www.gccxml.org/HTML/Index.html - caveat; I actually did not use it myself.

St0rm · Answer 4 · 2012-09-19T13:23:05+0000

For Java only, you can use BeautyJ .

You can run it against your file with -xml. * options. For example:

 java /your/dir/BeautyJ/lib/beautyj.jar beautyj -xml.out= -xml.doctype your_file.java

... and you get the XML representation of this file (and included it).

BTW: the options "-xml.out =" specify the output file. Used in this way, with the final "=", it is displayed on STDOUT. It's not a mistake.

Agrajag · Answer 5 · 2017-09-14T00:59:26+0000

srcml supports row number and column number . Here is an example of using a java file called input.java (remember that srcml supports several languages, including C / C ++), which contains the following:

 public class HelloWorld { public static void main(String[] args) { // Prints "Hello, World" to the terminal window. System.out.println("Hello, World"); } }

Then run srcml with the command to enable tracking of this additional location information:

 srcml input.java --position

It produces the following AST in XML format with row number and column number:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <unit xmlns="http://www.srcML.org/srcML/src" xmlns:pos="http://www.srcML.org/srcML/position" revision="0.9.5" language="Java" filename="input.java" pos:tabs="8"><class><specifier pos:line="1" pos:column="1">public<pos:position pos:line="1" pos:column="7"/></specifier> class <name pos:line="1" pos:column="14">HelloWorld<pos:position pos:line="1" pos:column="24"/></name> <block pos:line="1" pos:column="25">{ <function><specifier pos:line="2" pos:column="5">public<pos:position pos:line="2" pos:column="11"/></specifier> <specifier pos:line="2" pos:column="12">static<pos:position pos:line="2" pos:column="18"/></specifier> <type><name pos:line="2" pos:column="19">void<pos:position pos:line="2" pos:column="23"/></name></type> <name pos:line="2" pos:column="24">main<pos:position pos:line="2" pos:column="28"/></name><parameter_list pos:line="2" pos:column="28">(<parameter><decl><type><name><name pos:line="2" pos:column="29">String<pos:position pos:line="2" pos:column="35"/></name><index pos:line="2" pos:column="35">[]<pos:position pos:line="2" pos:column="37"/></index></name></type> <name pos:line="2" pos:column="38">args<pos:position pos:line="2" pos:column="42"/></name></decl></parameter>)<pos:position pos:line="2" pos:column="43"/></parameter_list> <block pos:line="2" pos:column="44">{ <comment type="line" pos:line="3" pos:column="9">// Prints "Hello, World" to the terminal window.</comment> <expr_stmt><expr><call><name><name pos:line="4" pos:column="9">System<pos:position pos:line="4" pos:column="15"/></name><operator pos:line="4" pos:column="15">.<pos:position pos:line="4" pos:column="16"/></operator><name pos:line="4" pos:column="16">out<pos:position pos:line="4" pos:column="19"/></name><operator pos:line="4" pos:column="19">.<pos:position pos:line="4" pos:column="20"/></operator><name pos:line="4" pos:column="20">println<pos:position pos:line="4" pos:column="27"/></name></name><argument_list pos:line="4" pos:column="27">(<argument><expr><literal type="string" pos:line="4" pos:column="28">"Hello, World"<pos:position pos:line="4" pos:column="42"/></literal></expr></argument>)<pos:position pos:line="4" pos:column="43"/></argument_list></call></expr>;<pos:position pos:line="4" pos:column="44"/></expr_stmt> }<pos:position pos:line="5" pos:column="6"/></block></function> }<pos:position pos:line="6" pos:column="2"/></block></class></unit>

Link: Documentation for srcml v0.9.5 (see srcml --help). I also often use srcml, including this function, to get location information.

How to convert source code to XML representation of ast object?

More articles: