Memory allocation optimization: from heap to stack

I am doing some reverse engineering tasks for binary files in 32-bit x86 architecture.

I recently found some interesting optimizations from C source code to a build program.

For example, the source code is similar (this source code is from the openssl library ):

 powerbufFree = (unsigned char *)malloc(powerbufLen); 

And after compilation ( gcc version 4.8.4 -O3 ), the build code looks like this:

 807eaa0: cmp eax, 0xbff # eax holds the length of the buf. 807eaa5: mov dword ptr [ebp-0x68], eax # store the length of powerbuf on the stack 807eaa8: jnle 0x807ec60 # 0x807ec60 refers to the malloc 807eaae: mov edx, eax 807eab0: add eax, 0x5e 807eab3: and eax, 0xfffffff0 807eab6: sub esp, eax 807eab8: lea eax, ptr [esp+0x23] 807eabc: and eax, 0xffffffc0 807eabf: add eax, 0x40 807ead3: mov dword ptr [ebp-0x60], eax # store the base addr of the buf on the stack. 

To my surprise, buf really stands out on the stack. It seems that optimization for the heap dispenser is for me, but I'm not sure.

So here is my question: does the optimization described above (malloc -> stack allocation) seem familiar? Does this make sense? Can someone provide guidance or specification for such optimization?

+6
source share
1 answer

From source bn_exp.c:

 0634 #ifdef alloca 0635 if (powerbufLen < 3072) 0636 powerbufFree = alloca(powerbufLen+MOD_EXP_CTIME_MIN_CACHE_LINE_WIDTH); 0637 else 0638 #endif 0639 if ((powerbufFree=(unsigned char*)OPENSSL_malloc(powerbufLen+MOD_EXP_CTIME_MIN_CACHE_LINE_WIDTH)) == NULL) 0640 goto err; 

Note that 0xbff is 3071. On systems that support it, alloca performs stack allocation. This refers to the GNU version used by Linux, and the BSD implementation copied this API from 32V UNIX from AT & T ( according to FreeBSD ).

You only looked at line 639. But if alloca defined, then the C code matches your assembly.

Optimization itself is often used to avoid the cost of using malloc for a temporary buffer if the distribution is relatively small. For C.1999, VLA can be used instead (since C.2011, VLA is an optional function).

Sometimes optimization just uses a fixed size buffer with some reasonably small size. For instance:

 char tmp_buf[1024]; char *tmp = tmp_buf; if (bytes_needed > 1024) { tmp = malloc(bytes_needed); } /* ... */ if (tmp != tmp_buf) { free(tmp); } 
+5
source

All Articles