Any software for pattern matching and source code?

I have old software (in a language that is not dead, but dead for me ;-)), which implements the basic system of pattern matching and source code handing. I am considering the possibility of resurrecting this code, translating it into a modern language, and opening the project as a refactoring tool. Before I go much further, I want to know if something like this exists (my google-fu is blowing air tonight).

Here's how it works:

  • the pattern matching part matches the source code patterns spanning multiple lines of code using a pattern with binding variables,
  • the rewriting part of the template uses the template to overwrite the matching code by inserting the contents of the associated variables from the matching template
  • matching and rewriting patterns are linked (1: 1) using a simple (unconditional) rewrite rule

the software works with the abstract syntax tree (AST) of the input application and outputs a modified AST, which can then be regenerated into new source code

for example, suppose we find a bunch of while-loops, which really should be for-loops. The following pattern will match the while-loop pattern:

Template oldLoopPtrn int @cnt@ = 0; while (@cnt@ < @max@) { … @body@ ++@cnt@; } End_Template 

while the following template will specify an output rewrite template:

 Template newLoopPtrn for(int @cnt@ = 0; @cnt@ < @max@; @cnt@++) { @body@ } End_Template 

and a simple rule to link them

 Rule oldLoopPtrn --> newLoopPtrn 

so the code looks like this:

 int i=0; while(i<arrlen) { printf("element %d: %f\n",i,arr[i]); ++i; } 

automatically rewrites to look like this:

 for(int i = 0; i < arrlen; i++) { printf("element %d: %f\n",i,arr[i]); } 

The closest I've seen are some of the code refactoring tools, but they seem to be designed to interactively overwrite selected fragments, rather than bulk automatic changes.

I believe that such a tool can overload refactoring and work in several languages ​​(even HTML / CSS). I also believe that converting and polishing the code base will be a huge project that I just can't do it alone at any reasonable time.

So, is there something like this already? If not, then any obvious signs (other than the terms of the rewriting rule) to consider?

EDIT: The only feature of this system that I really like is that the template templates are pretty obvious and easy to read, because they are written in the same language as the target source code, and not in some esoteric mutated regular expressions / Format BNF.

+7
language-agnostic pattern-matching templates
source share
5 answers

I am considering the possibility of resurrecting this code, translating it into a modern language, and open-sourcing a project as a refactoring tool.

I think it would be great that such a tool would be freely available.

But there is a commercial product: DMS Software Reengineering Toolkit .

The DMS Software Reengineering Toolkit is a set of tools for automating user analysis, modifying or translating source programs, or creating software systems containing arbitrary language combinations ("domains"). The term β€œsoftware” for DMS is very broad and covers any formal notations, including programming languages, markup languages, hardware description languages, design notations, data descriptions, etc. This toolkit is the first step towards implementing The Design Maintenance System β„’, an ambitious vision for the 21st century software development environment that supports the gradual building and maintenance of large semantics-driven application systems.

In addition, there is a coccinelle tool for the C source code:

Coccinelle is a program matching and transformation mechanism that provides the SmPL (semantic patch language) language to indicate the desired matches and transformations in the C code. Initially, Coccinelle was aimed at performing collateral evolutions on Linux. Such evolutions include changes that are needed in the client code in response to the evolution in the library APIs, and may include changes such as renaming a function, adding a function argument, the meaning of which depends on the context and reorganizing the data structure. Outside of collateral evolutions, Coccinelle has been successfully used (by us and others) to find and fix errors in system code.

+3
source share

TXL is rule-based, not template-based, so it has more features, but perhaps a steeper learning curve.

+1
source share

Compliance with Coccinelle's above rule will be:

 @@
 identifier cnt;
 expression max, E;
 @@

 cnt = 0
 ... when! = cnt = E
 -while (cnt <max)
 + for (cnt = 0; cnt <max; cnt ++)
 {
   ...
 - cnt ++;
 }

For C code:

 int main () {
   int i = 0;
   printf ("hello \ n");
   while (i <arrlen) {
     printf ("element% d:% f \ n", i, arr [i]);
     ++ i;
   }
 }

He gives

int main () {int i = 0; printf ("hello \ n"); for (i = 0; i <arrlen; i ++) {printf ("element% d:% f \ n", i, arr [i]); }}

The ... when! = Cnt = E allows arbitrary code between the initialization of cnt and the while loop, but checks that cnt is not overridden. A more complex rule can also get rid of cnt initialization if it is not used between initialization and the while loop.

+1
source share

Our DMS has already been mentioned. It has transformation rules that the OP believes are "easy to read because they are written in the same language as the target source code."

Here is a link that shows a complete, detailed example of pattern matching / conversion using DMS.

+1
source share

In TXL, it looks like this:

 include "c.grm" rule main replace [declaration_or_statement*] int cnt [id] = 0; while (cnt < max [shift_expression] ) { body [declaration_or_statement*] } deconstruct * body ++cnt; by for (int cnt = 0; cnt < max; cnt++) { body } end rule 

For this input:

 int main () { int i=0; while(i < arrlen) { printf("element %d: %f\n",i,arr[i]); ++i; } } 

This gives the following:

 int main () { for (int i = 0; i < arrlen; i++) { printf ("element %d: %f\n", i, arr [i]); ++i; } } 

In the case of the split shown in the Coccinelle example above, you add a similar parameter to this rule.

+1
source share

All Articles