Automated x86 command obfuscation

Question

Automated x86 command obfuscation

I am working on an asm x86 asm obfuscator that takes Intel syntax code as a string and outputs an equivalent set of opcodes that get confused.

Here is an example:

mov eax, 0x5523 or eax, [ebx] push eax call someAPI

It becomes something like:

 mov eax, 0xFFFFFFFF ; mov eax, 0x5523 and eax, 0x5523 ; push [ebx] ; xor eax, [ebx] or [esp], eax ; pop eax ; push 12345h ; push eax mov [esp], eax ; call getEIP ; call someAPI getEIP: ; add [esp], 9 ; jmp someAPI ;

This is just an example, I have not verified that this will not ruin the flags (maybe this is so).

Now I have an XML document that lists command templates (e.g. push e*x ) and a list of replacement instructions that you can use.

What I'm looking for is a way to automatically generate opcode sequences that produce the same result as the input. I am not opposed to doing educated brute force, but I am not sure how I approach this.

+7

assembly x86 automation obfuscation

Polynomial Oct 30 '11 at 19:47

source share

1 answer

Ira Baxter · Accepted Answer · 2011-10-30T20:19:48+0000

What you need is an algebraic descriptor of what the operation codes do, and a set of algebraic laws that allow you to define equivalent operations.

Then for each instruction you will find its algebraic description (for example,

  XOR eax,mem[ecx]

whose algebraic equivalent is <

  eax exclusive_or mem[ecx]

list algebraic equivalences using these algebra equivalents, such as:

  a exclusive_or b ==> (a and not b) or (b and not a)

to generate an equivalent algebraic expression for your XOR instruction

  eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

You can apply additional algebraic laws to this, for example, Morgan's theorem:

  a or b ==> not (not a and not b)

To obtain

 (not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

At this point, you have a specification for algebraic computing that will do the same as the original. There is your brute force.

Now you need to “compile” this for machine instructions, matching which instructions will do what it says. Like any compiler, you probably want to optimize the generated code (it makes no sense to extract mem [ecx] twice). (All this is complicated ... its code generator!) The resulting code sequence will look something like this:

 mov ebx, mem[ecx] mov edx, ebx not edx and edx, eax not eax and eax, ebx not eax or eax, edx

These are many mechanisms for manual assembly.

Another way to do this is to use a program conversion system that allows you to apply source transformations to a code source. Then you can code "equivalence" as you rewrite it directly in the code.

One of these tools is our DMS Software Reengineering Toolkit .

DMS accepts the definition of langauge (essentially, like EBNF), automatically implements the analyzer, AST constructor, and prettyprinter (anti-parser, turning the AST back into actual source text). [DMS does not currently have EBNF for ASM86, but dozens of EBNFs for various DMSs have created complex langauges, including several for different non-x86 assemblers. Therefore, you will need to define ASM86 EBNF for DMS. It is pretty simple; DMS has a really strong parser].

Using this, DMS will allow you to write source transformations directly in code. You can write the following transformations that directly implement the XOR equivalent and DeMorgan law:

  domain ASM86; rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction = " XOR \r, \m " rewrites to " MOV \free_register\(\),\m NOT \free_register\(\) AND \free_register\(\),\r NOT \r AND \r,\m OR \r,\free_register\(\)"; rule obfuscate_OR(r1: register, r2: register):instruction:instruction = " OR \r1, \r2" rewrites to " MOV \free_register\(\),\r1 NOT \free_register\(\) AND \free_register\(\),\r2 NOT \r2 AND \r1,\r2 NOT \r1";

with some extra magic in a meta-procedure called "free_register" that determines which registers are free at that point (AST matches) in the code. (If you do not want to do this, use the top of the stack as temporary everywhere, as in your example).

You will need a bunch of rewrites to cover all the cases you want to confuse, with their combinatorics with registers and memory operands.

Then, the transformation engine may be asked to apply these transformations at random one or more times at each point in the code to scramble it.

You can see a fully processed example of some algebraic transformations applied to DMS.

Automated x86 command obfuscation

More articles: