Transcoding to another language

Question

Transcoding to another language

What are typical ways to transcode code? I am currently writing a simple programming language, and the way it is processed is recursive. The list of nodes loops and says that the current node is a variable emit_variable_node node, it will call the emit_variable_node function, which will literally add some code to the line, for example:

The following code is psuedo-ish, I am writing my project in C and compiling C.

 char *file_contents; void emit_variable_node(VariableNode *var) { // I know += doesn't work on strings, just pretend it does. file_contents += var.getType(); file_contents += " "; // a space file_contents += var.getName(); // etc }

I also assume that the code we provided was semantically parsed and correct. The file_contents line is then stored in a temporary file, which is deleted after compilation by the C compiler.

Is this bad practice, or are there better, cleaner ways to do this?

+1

c compiler-construction

chapman Mar 10 '15 at 15:49

source share

1 answer

Ira Baxter · Accepted Answer · 2015-03-10T17:34:31+0000

You can write a parser in any way you like, and generate code when parsing it; there are no AST nodes needed ("syntactic directional translation"). This usually creates quite awful code, because the code generator does not have the ability to take context into account to generate better code.

You can create a parser that builds abstract syntax trees (AST) as the first pass, and then the tree generation code will go through the second pass without looking at any neighboring nodes. This is just the previous answer with AST in it. Here, a stunningly bad example of an unoptimized transpiler output did something like this.

It is best to generate code from AST, where each local AST node code generator checks its neighbors to decide what to do. This will give you some better code.

The best solution is to follow traditional compilers, create a good interface for your language, including character tables, as well as control and data flow analysis. You can then use this to generate much better code.

As for the actual code generation: yes, you can print text strings. String templates are a little more convenient, but they are just a fancy way to print text strings, so they don’t add any power or improve the quality of the resulting code.

The best solution is to convert the AST to the source language, to the AST in your target language, including all local checks and the use of information from the symbol table and stream analysis. A good consequence of this is that by producing AST in the target language, you can now apply optimizations in the target language that are not possible in the source language. [Real compilers do something similar, but the terms they use "translate the AST to IR (internal representation)" and they do the optimization on IR.] After all optimizations on the target AST are complete, you should pretty- print the final AST ... using something like string patterns.

Most people do not have the energy to create a good transporter from scratch. Therefore, they do some hacker thing, like the first sentence (just say). But if you want to create a good foundation for converting code from one language to another, check out our DMS Software Reengineering Toolkit . DMS has parsers for many languages, can implement parsers for custom languages, automatically creates AST, provides great support for Life After Parsing , for example, creating a symbol table and flow analysis, converts AST to AST and has pretty printers. DMS is designed for a platform to support these kinds of tasks . This means that you can focus on creating a high-quality part of the translation, and not on trying to build all this useful infrastructure.

Transcoding to another language

More articles: