Automatic deduplication of assembly language code?

I watched several assembly videos to better understand how to manually optimize * .s files left over after compilation with gcc/g++ -S ... One of the topics discussed was Refactoring redundant code , which demonstrates how to move redundant code to your own labeled block ending with ret and replacing it with call .

The example shown in the video is two blocks containing:

 mov eax,power mul ebx mov power,eax inc count 

which he replaces with call CalculateNextPower , and CalculateNextPower looks like this:

 CalculateNextPower: mov eax,power mul ebx mov power,eax inc count ret 

Out of curiosity, trying to reduce the compiled size, I compiled some C and C ++ projects with -S and various optimizations, including -Os, -O2, -O3, -pipe, -combine and -fwhole-program, and analyzed the result * .s for redundancy using a slightly corrected (for .s files) version of duplo . Only the -fwhole-program (now obsolete IIRC) had a significant impact on the elimination of duplicate code for files (I assume that its replacement (-s) -flto will behave similarly during the link - roughly equivalent to compiling with -ffunction-sections -fdata -sections and linking with -gc partitions), but still skips significantly large blocks of code.

Manual optimization using duplo output reduced the size by ~ 10% in a random C project and almost 30% in a random C ++ project, when only adjacent assembly blocks were deduplicated with at least 5 adjacent duplicate instructions.

I miss the compiler option (or even a stand-alone tool) that automatically removes redundant assembly when compiling for size (including other compilers: clang, icc, etc.) or is this function missing (for some reason?)

If this does not exist, duplo could be modified to ignore lines beginning with the character '.' or ';' (and others?) and replace duplicate code blocks with function calls with duplicate code, but I am open to other sentences that will work directly with the internal representations of the compiler (preferably clang or gcc).

Edit: I fixed duplication to identify duplicate assembly blocks here , but manual refactoring is still required for this. As long as the same compiler is used to generate the code, it may be possible (but probably slower) to identify the largest blocks of the duplicated code, put them in your own “function” block and replace the code with CALL with this block.

+7
compiler-optimization assembly size code-duplication
source share
2 answers

What you need is a clone checker tool .

They exist in many implementations, which vary depending on the detail of the elements of the processed document and the volume of the structure.

Those that match the source string (won't work for you, you want to parameterize your routines using different constants [and data and index offset], or named locations or other named subintits). Token-based detectors can work because they will identify single-point locations (e.g., constants or identifiers) that differ. But what you really want is structural matches that can choose addressing options or even code options in the middle of a block (see AST-based clone detectives, which I think I built).

To discover with a structure, you must have a structure. Fortunately, even assembler language code has a grammar structure, and code blocks separated by subroutine entries and outputs (the latter are more difficult to detect in an assembly because there can be more than one).

When you discover the use of structures, you have at least the ability to use the structure to modify the code. But if you have the source of the program presented as a tree, you have a structure (subtrees and sequences of subtrees) by which you can detect clones, and you can abstract clone matches by modifying the trees at the points of coincidence. (Earlier versions of my clone detector for COBOL abstracted clones into the COPY libraries. We stopped doing this mainly because you did not want to abstract each clown in this way).

+2
source share

What you offer is called procedural abstraction and has been implemented by more than one group as research projects. Here is one. Here is another. And further.

Clone detection is commonly used in the context of source code, although its function is similar. Since procedural abstraction occurs at a lower level, it can achieve more. For example, suppose there are two calls for different functions, but with exactly the same complex argument calculations. A procedural abstraction can output the argument to a procedure, but it would be difficult for a clone detector to do this.

I do not believe gcc or llvm currently supports a supported PA. I searched both sets of documents and did not find them. In at least two cases above, the optimizer runs on the assembly code generated by gcc, and not as internal gcc optimization. This probably explains why these methods were not built into the compiler. You can ask the authors to see where their implementation is.

0
source share

All Articles