Understanding Dalvik code breakdown?

I play with smali and baksmali in the small Hello World Android app that I wrote. My source code:

package com.hello; import android.app.Activity; import android.os.Bundle; public class Main extends Activity { /** Called when the activity is first created. */ @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.main); } } 

which was then parsed into:

 .class public Lcom/hello/Main; .super Landroid/app/Activity; .source "Main.java" # direct methods .method public constructor <init>()V .locals 0 .prologue .line 6 invoke-direct {p0}, Landroid/app/Activity;-><init>()V return-void .end method # virtual methods .method public onCreate(Landroid/os/Bundle;)V .locals 1 .parameter "savedInstanceState" .prologue .line 10 invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)V .line 11 const/high16 v0, 0x7f03 invoke-virtual {p0, v0}, Lcom/hello/Main;->setContentView(I)V .line 12 return-void .end method 

I understand that this is some kind of intermediate representation, but I'm not sure what it is. As far as I understand, there should be some specification of how to understand this view, but I can’t understand how to look for it. So, given the apk file, can anyone explain in non-specialist terms how the Dalvik option specification is used to get this view? My current understanding is this:

  • Given the APK, I could extract AndroidManifest.xml in a binary XML file format and use a tool like axml2xml.pl to get a "text" version of the manifest, which is not full OR I could use apktool to get a more readable form. But I'm still not sure which they use to convert binary XML to text.
  • disassemblers somehow using the Dalvil option specification to read dex files and convert them to the above representation.

Any information (perhaps with some simple examples) in the above two steps will help me in understanding these concepts correctly.

Update 1 (published after Chris's answer):

Thus, I would do the following to arrive at Dalwick's bytecode:

  • Take apk and extract it to get classes.dex files.
  • Then the disassembler reads the classes.dex file and determines all the classes present in apk. Can you provide me some information on how this is done? Does it parse the file in hexadecimal mode and look for the Dalvik specification, and then resolve it accordingly? Or is something else going on? For example, when I used hexdump on classes.dex, it gave me something like this:

    64 65 78 0a 30 33 ...

Are they used for searching in Opcode mode?

  • Assuming the tool was able to split the incoming bytecode into separate classes, it continues to scan the hexadecimal codes from the classes.dex file and uses the Davlik specification to output the corresponding Opcode name from the table?

In fact, in short, I am interested to know how all this "magic" is done. So, for example, if I have to learn to write this tool, what should be the high-level roadmap that I should follow?

+4
java android disassembly dalvik reverse-engineering
source share
2 answers

What you are looking at is the davlik bat code. Java code is converted to Dalvik bytecode using the dx tool. The phenomenon is a separate problem with which I can in a minute. Effectively, when you compile your Android application, the dx tool converts your Java code to bytecode (just like javac converts Java bytecode to Java for a standard JVM application) using 256 dalvik code codes.

For example, invoke-super is an invoke-super that points the dvm virtual machine (dalvik virtual machine) to a superclass method. Similarly, invoke-interface instructs dvm to invoke an interface method.

So you can see that

 super.onCreate(savedInstanceState); 

translates to

 invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;) 

In this case, invoke-super accepts two parameters, the group {p0,p1 and the parameter Landroid/app/Activity;->onCreate(Landroid/os/Bundle;) , which is a specification of the method that it uses to search and, if necessary, resolves the method.

Then there invoke-direct call in the constructor area.

 invoke-direct {p0}, Landroid/app/Activity;-><init>()V 

Each class has an init method, which is used to initialize the data members of the class, also called the constructor. When you build a class, the virtual machine must also call the constructor of the superclass. This explains why the constructor for your class calls the Activity constructor.

As for the manifest, what happens (it's all in the Dalvik specs if you check the source code) is that the compiler (which generates the apk file) converts the manifest into a more compressed format (binary xml) in order to save space. This manifest has nothing to do with the code you posted, it instructs more that the dvm on how to process the application is intact with respect to Activities , Services , etc. What you published is being executed.

This is a high level answer to your question. If you need more, let me know and I will do my best.

Change You are basically right. The decompiler reads binary data as a byte stream from the dex file. He understands what format should be and is able to pull out information such as constants, classes, etc. As for opcodes, this is exactly what she does. He understands that the byte value for each operation code (or as presented in the dex file) and can convert it to a human-readable string. If you are going to implement this, in addition to understanding the general basics of compilers, I would start with a deep understanding of the structure of the dex file. From there, you will need to build a table that matches the values ​​of the opcode with a human-readable string. With the help of this information and some additional information about string constants, etc. You could create a text file view of the compiled class. It makes sense?

+14
source share

The opcode specification describes only instructions. the dex file format is something more - it contains all the metadata needed by the Dalvik virtual machine (and disassembler) to interpret file strings, classes, types, methods, etc. See Also the official opcode option, it is more complete and detailed than the one you linked.

<plug> BTW, the next version of IDA Pro will support parsing .dex files </plug>

+3
source share

All Articles