Removing unnecessary lines from a C ++ file

There are many times when, when I debug or reuse code, the file begins to receive lines that do nothing, although they may have done something at some point.

Things like vectors are populated and then not used, classes / structures that are defined but never used, and functions declared but never used.

I understand that in many cases some of these things are not redundant, as they can be seen from other files, but in my case there are no other files, just some extraneous code in my file.

Although I understand that from a technical point of view, calling push_back does something, and therefore the vector is not used on its own, in my case its result is not used.

So: is there a way to do this using a compiler (clang, gcc, VS, etc.) or an external tool?

Example:

 #include<vector> using namespace std; void test() { vector<int> a; a.push_back(1); } int main() { test(); return 0; } 

It should become: int main(){return 0};

+7
source share
4 answers

Our DMS Software Reengineering Toolkit with its C ++ 11 interface can be used for this; he is currently not doing this off the shelf. DMS is designed to create custom tools for arbitrary source languages ​​and contains complete analyzers, name converters and various stream analyzers to support analysis, as well as the ability to apply source transformations to a source of code based on the results of the analysis.

In general, you want static analysis to determine whether each calculation is used (there may be several results, consider only "x ++") or not. For each unused calculation, in essence, you want to remove unused calculations and repeat the analysis. For reasons of efficiency, you want to conduct an analysis that determines all (points) of using the results (results) only once; it is essentially a data flow analysis. When the calculation result set is empty, this calculation result can be deleted (note that deleting the value “x ++” can leave “x ++”, because the increment is still necessary!) And the sets of calculation sets on which it depends can be adjusted to remove links from deleted ones, which can lead to more paragraphs.

To do this analysis for any language, you should be able to track the results. For C (and C ++), this can be pretty ugly; there is an “obvious” use when the result of a calculation is used in an expression and where it is assigned a local / global variable (which is used elsewhere), and there are indirect assignments through pointers, updates to object fields, through arbitrary drops, etc. To know these effects, a dead code analysis tool must be able to read the entire software system and calculate the data flows through it.

To be safe, you want the analysis to be conservative, for example, if the tool does not have evidence that the result is not being used, then it should assume that the result is being used; you often have to do this with pointers (or array indexes that simply mask pointers), because in the general case, you cannot determine exactly where the pointer "points". Obviously, you can create a “safe” tool, assuming that all the results are used: -} You will also sometimes get very conservative, but necessary assumptions for library procedures for which you have no source. In this case, it is useful to have a set of pre-computed totals of side effects of the library (for example, strcmp does not, sprintf overwrites a specific operand, push_back changes its object ...). Since libraries can be quite large, this list can be quite large.

The DMS as a whole can analyze the entire source code database, build symbol tables (therefore, it knows which identifiers are local / global and their exact type), control and analyze the local data stream, create local summary data about side effects for each function, build a call schedule and global side effects, as well as make a global point analysis, providing this “calculation information” with appropriate conservatism.

DMS was used to perform these calculations on C-code systems with 26 million lines of code (and yes, this is a really big calculation, it needs a 100Gb VM to run). We did not implement the part of eliminating dead code (the project had a different goal), but it is simple as soon as you have this data. DMS eliminated dead code in large Java codes with more conservative analysis (for example, “without using identifier references”, which means identifier assignments are dead), which causes an amazing amount of code deletion in many real codes.

DMS C ++ parser is currently building character tables and can perform control flow analysis for C ++ 98 when C ++ 11 is at hand. We still need local data flow analysis, which is some effort, but global analysis already exists in the DMS and is available for use for this effect. ("Lack of identifier use" is readily available from character table data, unless you mind a more conservative analysis).

In practice, you do not want the instrument to be silent; some may actually be the calculations you want to keep anyway. What the Java tool does is give two results: a list of dead calculations that you can check to decide if you believe, and a version of the source code with the remote code. If you read the dead code report, you save the version with the deleted code; if you see a “dead” calculation, which, in your opinion, should not be dead, you modify the code so that it is not dead, and run the tool again. With a large code base, you can try checking the dead code report itself; how do you know if any worthy dead code is not appreciated by someone else in your team? (Version control can be used to restore if you goof!)

A really complex problem that we cannot handle (and not a single tool that I know about) is the "dead code" when there is conditional compilation. (Java does not have this problem, C has it at its peak, but much less in C ++). It can be really unpleasant. Imagine a conditional expression in which the hand has certain side effects and the other hand has different side effects, or another case where one of them is interpreted by the GCC C ++ compiler and the other lever is interpreted by MS, and the compilers do not agree with what they do constructs (yes, C ++ compilers disagree in dark corners). At best, here we can be very conservative.

CLANG has some ability to perform flow analysis; and some ability to do source transformations, so it could be forced to do so. I do not know if he can conduct a global analysis of flows / points. It seems to have a bias towards single compilation units, since its main use is a single compilation unit.

+3
source

To catch unused variables, you can enable the -Wunused flag in the gcc compiler. This will warn you of unused parameters, variables, and computed values ​​at compile time. I found that using the -Wall -Wextra and -Werror flags ensures that the compiler catches some of the problems you describe. More information can be found here: http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

Regarding the search for unused classes, one of the options is to use an IDE, such as Eclipse, and use the Find Links function to find places where this class / object can be used.

+1
source

The short answer is no. From the static analysis of client code, it is impossible to determine that the vector push_back method does not have any important side effects. For all analysis tools, it is known that it is recorded somewhere in the database and controls stock trading.

+1
source

I would recommend using version control software - SVN, Git, Mercurial, Perforce, ... - so that after debugging you can use this version control tool to find and remove debugging residues. This makes it easier to keep the code leaner.

In addition, this type of test code usually has a little coverage for testing, so if you have unit testing, they should appear as open code.

Then there are tools that are clearly looking for such things - Lint, Coverity, etc. Most of them are commercial. Also try using -O3 in GCC, the compiler can recognize more unused variables in this way, as it will more aggressively inject and eliminate code.

0
source

All Articles