Python bytecode obfuscation using interpretive mutation

Actually, Dropbox did it very well, they were able to protect their desktop application made in python; I researched this a lot, but not a better solution than obfuscation, which is not a very safe way to go, and you will eventually see that your code is loaded somewhere.

I listened to a session made by Giovanni Bajo (founder of PyInstaller), he said Dropbox does this:

  • Bytecode-scrambling by recompiling your CPython interpreter and the standard CPython interpreter will not be able to run it, only the recompiled cpython interpreter.
  • All you have to do is shuffle the numbers below define loadup 8 .

I have never read the Python source code, so I will not claim to fully understand the words above.

I need to hear the voice of experts: how to do this? And if after recompilation I can pack my application using available tools like PyInstaller?

Update:

I did some research on how Dropbox performs this type of obfuscation / mutation, and I found this:

According to Hagen Fritch , they do this in two stages:

  • They use the TEA cipher along with RNG, seeded with some values ​​in the code object of each python module. They adjusted the translator accordingly so that he

    a) Decrypts the modules and

    b) Prevents access to decrypted code objects.

    This would be a simple way for Dropbox to decrypt everything and reset modules using the built-in marshaller.

  • Another trick is to manually scramble the operation codes. Unfortunately, this could be fixed semi-automatically, so their mono-alphabetic substitution cipher turned out to be very effective in terms of gaining some time.

I still need to learn more about how this can be done, moreover, I do not know how decryption takes place in this process ... I want the whole voice of experts here ... ordinary guys, where are you.

+6
compiler-construction python cpython packaging ctypes
source share
1 answer

I assume this is about shuffling numbers in include/opcode.h . I do not see #define loadup , but maybe this refers to some old version of Python. I have not tried this.

This will confuse your .pyc files so that they cannot be scanned with any tools that recognize regular .pyc files. This can help you hide some security measures inside your program. However, an attacker can (for example) extract your own Python interpreter from your application package and use it to verify files. (Just start the interactive interpreter and start the study by importing and using dir on the module)

Please note that your package will undoubtedly contain some modules from the Python standard library. If an attacker suspects that you shuffled the operation codes, he could do a comparison of bytes by byte between your version and the regular version of the standard module and thus detect your operation codes. To prevent this simple attack, you can protect the modules with the correct encryption and try to hide the decryption step in the interpreter, as indicated in the updated question. This forces the attacker to use machine code debugging to find the decryption code.


I do not know how decryption occurs in this process ...

You must change the part of the interpreter that imports the modules and paste the C decryption code there.

+2
source share

All Articles