How to check if binary files are created from specific sources

The inherited project I'm working on includes some external library as a set of jar binaries. We decided that for analysis and potential correction, we want to get the sources of this library, use them to create new binaries, and after detailed and rather long testing of regression testing for these binaries.

Suppose we have already extracted and built sources (I'm actually in the planning stage). Before actual testing, I would like to perform some “compatibility checks” to exclude the possibility that the sources are something significantly different from what is in the “old” binaries.

Using the javap tool, I was able to extract the version of the JDK used for compilation (at least I believe this is a version of the JDK). It states that binaries were created using major version 46 and minor 0. According to this article, it maps to JDK 1.2.

Suppose the same JDK is used to compile sources.

question : Is there a reliable and possibly effective verification method if both of these binaries are built from the same sources? I would like to know if all method signatures and class definitions are identical, and if most or maybe all method implementations are identical / similar.

The library is quite large, so I believe that a detailed analysis of decompiled binaries may not be an option.

+7
source share
4 answers

I suggest a multi-step process:

Apply the previously suggested Jardiff or similar to see if there are any API differences. If possible, select a tool that has an option to report on private methods, etc. In practice, any significant change to the Java implementation may change some methods and classes, even if the public API has not changed.

If you have API compliance, compile several randomly selected files with the specified compiler, decompile the result and the source class files, and compare the results. If they match, apply the same process to the larger and larger bodies of code until you find a mismatch or check everything.

Differences in decompiled code are more likely to give you clues about the nature of the differences and are easier to filter for minor differences than the actual class files.

If you get a discrepancy, analyze it. This may be due to something you don't care about. If so, try building a script that will remove this form of difference and resume the compilation and comparison process. If you get widespread inconsistencies, experiment with compiler options such as optimizations. If the compiler settings eliminate the differences, continue the bulk comparison. The purpose of this phase is to find a combination of compiler options and decompiled code filters that lead to a match in the sample files and use them for mass library comparisons.

If you cannot get a close enough match in the decompiled code, you probably don't have the correct source code. However, if you have an API match, it might be worth building your system and running your tests using the compilation result. If your tests run at least with the version you created from the source code, continue to work with it.

+1
source

There are many JAR comparison tools. One that was pretty good, Jardiff . I have not used it for a while, but I am sure that it is still available. There are also some commercial offers in the same space that can suit your needs.

0
source

The Jardiff referred to by Perception is a good start, but there is no way to do it 100% theoretically confidently. This is due to the fact that the same source can be compiled using different compilers and different compiler configurations and optimization levels. Thus, it is not possible to compare binary code (bytecode) outside of class and method signatures.

What do you mean by a "similar implementation" of a method? Suppose a smart compiler drops an else event because it finds out that the condition may not be true. Are there two similar ones? Yes and no..: -)

The best way to go IMHO is to create very good test cases of regression that test every key function of your libraries. It can be horror, but in the long run it can be cheaper than hunting for mistakes. It all depends on your future plans in this project. Not a simple solution.

0
source

For method signatures, use a tool like jardiff.

For the similarity of implementation, you should return to conjecture. Comparison of the bytecode at opcode level can be compiler dependent and result in a lot of false negatives. If so, you can go back to compare class methods using LineNumberTable .

It gives you a list of line numbers for each method (so far the class file has been compiled with a debug flag, which is often missing in very old or commercial libraries).

If two class files are compiled from the same source code, then at least the line numbers of each method must match exactly.

You can use a library such as Apache BCEL to retrieve LineNumberTable:

  // import org.apache.bcel.classfile.ClassParser; JavaClass fooClazz = new ClassParser( "Foo.class" ).parse(); for( Method m : fooClazz.getMethods() ) { LineNumberTable lnt = m.getLineNumberTable(); LineNumber[] tab = lnt.getLineNumberTable(); for( LineNumber ln : tab ) { System.out.println( ln.getLineNumber() ); } } 
0
source

All Articles