Where are the null-terminated strings when converting from C to assembly?

I made two programs to output two lines: one in the assembly, and the other in C. This is the program in the assembly:

.section .data string1: .ascii "Hola\0" string2: .ascii "Adios\0" .section .text .globl _start _start: pushl $string1 call puts addl $4, %esp pushl $string2 call puts addl $4, %esp movl $1, %eax movl $0, %ebx int $0x80 

I am creating a program using

 as test.s -o test.o ld -dynamic-linker /lib/ld-linux.so.2 -o test test.o -lc 

And the result was as expected

 Hola Adios 

This is the C program:

 #include <stdio.h> int main(void) { puts("Hola"); puts("Adios"); return 0; } 

And I get the expected result, but when converting this C program to an assembly with gcc -S (OS is 32-bit Debian) the source code of the original assembly does not contain a null character in both lines, as you can see here:

  .file "testc.c" .section .rodata .LC0: .string "Hola" .LC1: .string "Adios" .text .globl main .type main, @function main: .LFB0: .cfi_startproc leal 4(%esp), %ecx .cfi_def_cfa 1, 0 andl $-16, %esp pushl -4(%ecx) pushl %ebp .cfi_escape 0x10,0x5,0x2,0x75,0 movl %esp, %ebp pushl %ecx .cfi_escape 0xf,0x3,0x75,0x7c,0x6 subl $4, %esp subl $12, %esp pushl $.LC0 call puts addl $16, %esp subl $12, %esp pushl $.LC1 call puts addl $16, %esp movl $0, %eax movl -4(%ebp), %ecx .cfi_def_cfa 1, 0 leave .cfi_restore 5 leal -4(%ecx), %esp .cfi_def_cfa 4, 4 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Debian 4.9.2-10) 4.9.2" .section .note.GNU-stack,"",@progbits 

My two questions are:

1) Why doesn't the gcc assembly code add a null character at the end of both lines? I thought C did it automatically.

2) If I skip the null characters in my manual assembly code, I get this output:

 HolaAdios Adios 

I understand why I received the "HolaAdios" part in the first line, but why does the program end successfully after the "Adios" part if it does not end with zero?

+7
c gcc x86 linux
source share
2 answers
  • .string always adds a null terminator, as shown here .
  • Well, you can check it out yourself. puts simply continues until it sees a null byte. \x00 very common, there must be one nearby for it to work (possibly due to alignment of the .rodata section).
+5
source share

Just add a little more detail:

Your second line has zero termination by accident, because after that nothing happens in your .data section. You dynamically bind glibc, which also has a .data section that maps to the address space of your process. This is a personal mapping, but I think it is displayed, not copied, so it is page aligned. The rest of the page on which your executable data segment runs is filled with zeros. (ABI may not guarantee this, but Linux must do something to avoid kernel data leakage.)

When your executable file is loaded into memory, the data segment is loaded separately from the text segment. See this answer for the difference between the partitions (which the linker is interested in) and the executable segments (which the boot loader takes care of).

Note that gcc places string constants in the .rodata section, which the linker places in the text segment of the executable file along with the .text : read-only section, so it can be shared between several processes running the same executable file. Sections are aligned by default with padding, so even if you put your lines in .rodata without null terminators, after the second there will be a null padding.

This would not have happened if it ended at the border of the right alignment (for example, the length was a multiple of 16 or something else).

By the way, you can confirm that there were no non-printable garbage characters after the line using strace ./string-test . You can see: write(1, "Adios\n", 6) = 6


.string is synonymous with .asciz . The manual uses a different language to describe the fact that they process backslash escape sequences and add null bytes , but they do the same. The GNU assembler has many synonyms for compatibility with many different collectors supplied by Unix providers, so it can be confusing to realize that there really is no difference when gcc uses .zero, but clang uses .skip or something like that.


I am creating a program with ...

The commands you use will only work on a 32-bit system. On a 64-bit host, you must build a 64-bit binary that still uses the 32-bit ABI system call. (And the 32-bit dynamic linker path, so it wouldn't even work by accident, even if the static data addresses are in the lower 32 bits, so they can be passed to the 32-bit shell for sys_write.)

In addition, I would recommend calling your source file test.S capital-S is common for the asm handwritten source. You can compile and link with gcc -m32 -nostartfiles test.S -o test to assemble and link in the same way as you did manually.

See this Q&A for complete information on creating asm for Linux: Assembling 32-bit binaries on a 64-bit system (GNU toolchain)

See also the x86 tag wiki for many interesting links.

0
source share

All Articles