How can I control the order of persistent pool entries using ASM?

I am implementing a conversion that removes unused elements from .class files to reduce their size. Since some constant pool entries will become unused, I have ASM to recount the persistent pool, and not copy it from the input. However, the converted .class files are sometimes larger than the originals because ordering the ASM pool requires the use of the ldc_w instructions (with a 2-byte index) where the input .class ldc file (with a 1-byte index) is used. I would like to manually sort the constant pool so that the constants referenced by ldc are the first.

You can also sort the pool of constants for other reasons: for example, to make the set of .class files more compressible by placing their constant pools in canonical order, to check tools that consume .class files, use the order as a watermark for software, or confuse incorrectly implemented decompilers / deobfuscators.

I met the ASM guide for the “constant”, but there were no useful hits other than a general explanation of what the constant pool is and “I hope ASM hides all the details associated with the constant pool, so you don’t have to worry about that.” It is anti-useful in this case.

How can I control the order in which ASM emits persistent pool entries?

+5
source share
1 answer

ASM does not provide a clean way to do this, but it is possible if you want to define new classes in the org.objectweb.asm package (or use reflection to access the private parts of the package). This is not ideal because it introduces dependence on the details of the ASM implementation, but it is the best we can do. (If you know this is not a way to hack, add it as another answer.)

Some things that do not work

ClassWriter provides newConst (and options for other pool input types) to allow the implementation of custom attributes. Since ASM will reuse persistent pool entries, you can assume that you can pre-populate the constant pool in the correct order by calling newConst and friends. However, many entries from scratch refer to other entries with a constant pool (in particular, Utf8 entries referenced by String and Class entries), and these methods will automatically add links if they are not already present. Thus, it is impossible to put the String constant before, for example, Utf8 it refers. These methods can be overridden, but this does not help, because this behavior is baked in the private or private method that they delegate.

This post suggests sorting the internal ClassWriter data structures in an overloaded visitEnd . This does not work for two reasons. Firstly, visitEnd is final (maybe it wasn't in 2005 when this post was written). Secondly, ClassWriter emits class bytes during a visit, so by the time visitEnd , the constant pool is already written as bytes, and the constant pool indexes are already baked into code bytes.

Decision

A solution requires two rounds of writing a class. First we will write the class normally (including other transformations), and then use another ClassWriter with a pre-filled constant pool to analyze and rewrite the result of the first round. Since ClassWriter builds bytes with a constant pool, we must do this manually before starting the second parsing and writing. We encapsulate the second parsing / writing in the first ClassWriter toByteArray method.

Here is the code. Actual sorting occurs in the sortItems method; here we sort by the number of occurrences as the ldc / ldc_w (compiled using MethodVisitor, note that visitMethod is final, so it must be separate). If you want to implement a different type, change the sortItems and add the storage fields on which your sort is based.

 package org.objectweb.asm; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.List; import java.util.Map; public class ConstantPoolSortingClassWriter extends ClassWriter { private final int flags; Map<Item, Integer> constantHistogram; //initialized by ConstantHistogrammer public ConstantPoolSortingClassWriter(int flags) { super(flags); this.flags = flags; } @Override public byte[] toByteArray() { byte[] bytes = super.toByteArray(); List<Item> cst = new ArrayList<>(); for (Item i : items) for (Item j = i; j != null; j = j.next) { //exclude ASM internal bookkeeping if (j.type == TYPE_NORMAL || j.type == TYPE_UNINIT || j.type == TYPE_MERGED || j.type == BSM) continue; if (j.type == CLASS) j.intVal = 0; //for ASM InnerClesses tracking cst.add(j); } sortItems(cst); ClassWriter target = new ClassWriter(flags); //ClassWriter.put is private, so we have to do the insert manually //we don't bother resizing the hashtable for (int i = 0; i < cst.size(); ++i) { Item item = cst.get(i); item.index = target.index++; if (item.type == LONG || item.type == DOUBLE) target.index++; int hash = item.hashCode % target.items.length; item.next = target.items[hash]; target.items[hash] = item; } //because we didn't call newFooItem, we need to manually write pool bytes //we can call newFoo to find existing items, though for (Item i : cst) { if (i.type == UTF8) target.pool.putByte(UTF8).putUTF8(i.strVal1); if (i.type == CLASS || i.type == MTYPE || i.type == STR) target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1)); if (i.type == IMETH || i.type == METH || i.type == FIELD) target.pool.putByte(i.type).putShort(target.newClass(i.strVal1)).putShort(target.newNameType(i.strVal2, i.strVal3)); if (i.type == INT || i.type == FLOAT) target.pool.putByte(i.type).putInt(i.intVal); if (i.type == LONG || i.type == DOUBLE) target.pool.putByte(i.type).putLong(i.longVal); if (i.type == NAME_TYPE) target.pool.putByte(i.type).putShort(target.newUTF8(i.strVal1)).putShort(target.newUTF8(i.strVal2)); if (i.type >= HANDLE_BASE && i.type < TYPE_NORMAL) { int tag = i.type - HANDLE_BASE; if (tag <= Opcodes.H_PUTSTATIC) target.pool.putByte(HANDLE).putByte(tag).putShort(target.newField(i.strVal1, i.strVal2, i.strVal3)); else target.pool.putByte(HANDLE).putByte(tag).putShort(target.newMethod(i.strVal1, i.strVal2, i.strVal3, tag == Opcodes.H_INVOKEINTERFACE)); } if (i.type == INDY) target.pool.putByte(INDY).putShort((int)i.longVal).putShort(target.newNameType(i.strVal1, i.strVal2)); } //parse and rewrite with the new ClassWriter, constants presorted ClassReader r = new ClassReader(bytes); r.accept(target, 0); return target.toByteArray(); } private void sortItems(List<Item> items) { items.forEach(i -> constantHistogram.putIfAbsent(i, 0)); //constants appearing more often come first, so we use as few ldc_w as possible Collections.sort(items, Comparator.comparing(constantHistogram::get).reversed()); } } 

Here's the ConstantHistogrammer, which is located in org.objectweb.asm , so it can refer to Item . This implementation is specific to ldc sorting, but demonstrates how to do other custom sortings based on information from a .class file.

 package org.objectweb.asm; import java.util.HashMap; import java.util.Map; public final class ConstantHistogrammer extends ClassVisitor { private final ConstantPoolSortingClassWriter cw; private final Map<Item, Integer> constantHistogram = new HashMap<>(); public ConstantHistogrammer(ConstantPoolSortingClassWriter cw) { super(Opcodes.ASM5, cw); this.cw = cw; } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { return new CollectLDC(super.visitMethod(access, name, desc, signature, exceptions)); } @Override public void visitEnd() { cw.constantHistogram = constantHistogram; super.visitEnd(); } private final class CollectLDC extends MethodVisitor { private CollectLDC(MethodVisitor mv) { super(Opcodes.ASM5, mv); } @Override public void visitLdcInsn(Object cst) { //we only care about things ldc can load if (cst instanceof Integer || cst instanceof Float || cst instanceof String || cst instanceof Type || cst instanceof Handle) constantHistogram.merge(cw.newConstItem(cst), 1, Integer::sum); super.visitLdcInsn(cst); } } } 

Finally, here is how you use them:

 byte[] inputBytes = Files.readAllBytes(input); ClassReader cr = new ClassReader(inputBytes); ConstantPoolSortingClassWriter cw = new ConstantPoolSortingClassWriter(0); ConstantHistogrammer ch = new ConstantHistogrammer(cw); ClassVisitor s = new SomeOtherClassVisitor(ch); cr.accept(s, 0); byte[] outputBytes = cw.toByteArray(); 

The conversion applied by SomeOtherClassVisitor will only happen on the first visit, and not on the second visit inside cw.toByteArray() .

There is no test suite for this, but I applied the above view to rt.jar from Oracle JDK 8u40, and NetBeans 8.0.2 functions normally using converted class files, so it is at least basically correct. (The conversion saved 12,684 bytes, which is hardly worth the price of itself.)

The code is available as Gist under the same license as ASM itself.

+7
source

Source: https://habr.com/ru/post/1215455/


All Articles