Reusing characters in a compiled .exe file

Once, out of curiosity, I tried hex-editing the executable file of the game "Dangerous Dave". I looked at the file for any lines I could find and made some random changes to see if this really changed the text displayed in the game.

I was surprised to see the result, which I now recreated using the hex editor and DOSBox: enter image description here

As you can see, editing the two characters "RO" in the line "ROMERO" led to a change of 4 characters, resulting in "ZUMEZU". The program seems to reuse two characters and print them at the beginning and end of this line.

What is the reason for this? My first guess would be to try to make the executable smaller, but only code that reuses characters will probably require more space than the 2 bytes that will be saved. Is this just a trick by the author, or just some kind of voodoo compiler?

+6
source share
1 answer

It is hard to say without reverse engineering, but I assume that a lot of constant data in the program is compressed using an algorithm from the LZ family. These compression schemes work basically the way you noticed: they encode repeating substrings as links to text that was previously decoded.

These compression algorithms were probably used not only for one line, but not only for text; it is possible that they were also used to compress other data, such as graphs or level layouts. In short, there was probably a significant saving thanks to using this algorithm!

Using these compression algorithms is common in older games as a way to save disk space, but it wasn’t automatic - the implementation of this algorithm was most likely something Romero added.

+5
source

All Articles