Source for manipulating sources

I need to do some manipulation of the source source in the Linux kernel. I tried using clang for this purpose, but there is a problem. Clang preprocesses the source code, that is, the macro, and enables the extension. This causes clang to sometimes create broken C code from the perspective of the Linux kernel. I cannot support all changes manually, as I expect to have thousands of changes per file.

I tried ANTLR, but public grammars are incomplete and not suitable for projects such as the Linux kernel.

So my question is the following. Are there any ways to do source manipulation with source for C code without preprocessing?

Assume the following code.

#define AAA 1 void f1(int a){ if(a == AAA) printf("hello"); } 

After applying source manipulation with the source, I want to get this

 #define AAA 1 void f1(int a){ if(functionCall(a == AAA)) printf("hello"); } 

But Clang, for example, produces the following code that does not meet my requirements, that is, it extends the AAA macro

 #define AAA 1 void f1(int a){ if(functionCall(a == 1)) printf("hello"); } 

I hope I was clear enough.

Edit

The above code is just an example. The source-source processing that I want to do is not limited to replacing the if() operator, but also inserting a unary operator before the expression, replacing the arithmetic expression with its positive or negative value, etc.

Decision

There is one solution that I have found for myself. I use gcc to create preprocessed source code, and then use Clang. Then I have no problems with the extension of macros and include, since this work is done by gcc. Thanks for answers!

+4
source share
5 answers

You can consider http://coccinelle.lip6.fr/ : it provides a nice semantic framework.

+4
source

The idea would be to replace all occurrences

 if(a == AAA) 

with

 if(functionCall(a == AAA)) 

You can do this easily using, for example, the sed tool.

If you have a finite set of replacement templates, you can write a sed script to perform the replacement.

Did this solve your problem?

+2
source

Preprocessor processing is one of the most difficult problems when applying transformations to C (and C ++).

Our DMS Software Reengineering Toolkit with its C Front End comes relatively close to this. DMS can parse C source code while preserving most preprocessor conditions, macro definitions, and usage.

He does this by allowing preprocessor actions in โ€œwell-structuredโ€ places. Examples: #defines are allowed where declarations or statements, macros and conventions can occur as a substitute for many non-terminals in the language (e.g. function head, expression, expression, declarations) and in many unstructured places that usually place them (e.g. , #if fooif (...) {#endif). It analyzes the source code and preprocessor directives as if they were part of the same language (they are ARE, it is called "C"), and builds the corresponding AST, which can be converted and restored correctly using captured pre-processor directives. [This feature level is great for the OP example.]

Some directives are poorly placed (both in terms of syntax, for example, on several fragments of the language, and in the "understandable sense" that you have to joke). These DMSs are handled by expanding them, with some guidance from an advanced engineer ("always extend this macro"). A less satisfactory approach is to manually convert unstructured preprocessors / macros to structured ones; it is a little painful, but more functional than one would expect, since bad cases occur at a much lower frequency than good ones.

To do better, you need to have symbol tables and stream analysis that take into account the preprocessor conditions and capture all the preprocessor conventions. We conducted experimental work with DMS to capture conditional declarations in the symbol table (it seems that this works fine), and we are just starting work on the scheme for the latter.

Itโ€™s not easy to be green.

0
source

Clang maintains extremely accurate source code information.

In particular, the SourceManager can determine whether a given token has been expanded from a macro or written as is, and Chandler Caruth recently implemented macro information that can display the actual macro extension stack (at various stages of extensions), referring to the actual written code (3.0).

Therefore, you can use the generated AST, and then rewrite the source code with all its macros still in place. You will need to query almost all of the node to find out if this is due to a macro extension or not, and if it extracts the source code of the extension, but still seems possible.

  • Clang has a rewrite module
  • You can dig Chandler's code into a stack of diagnostic macros

So, I think you should have everything you need :) (And I hope because I can not help much more: p)

0
source

I would advise resorting to the Rose framework. The source is available on github.

0
source

All Articles