Assigning a string literal to a char array, how does a string literal get copied to the stack?

I understand when you do char array [] = "string", the string literal "string" is copied from the data segment onto the stack. Is a string literal a copy of a character? Or does the compiler get the start and end address of a string literal and copy the entire string onto the stack at a time?

thanks

+4
source share
4 answers

The compiler does whatever it wants, as long as the observed result is the same. Sometimes there is no copy at all.

The C standard does not specify how copying is done, so a C implementation is free to achieve results by any means. The only requirement imposed by the C standard is that the observed results (such as text written to standard output) must be as defined.

When engineers design a C implementation with high quality, they will spend some time considering the best ways to copy a string in such a situation, and they will strive to develop a compiler that chooses the best way in each situation. A short string can be built in place using the "move immediate value" instructions. A long string can be copied by calling memcpy . The intermediate line can be copied using the built-in memcpy call, in fact several instructions that move several bytes each.

When engineers develop a low-cost implementation of C, something that just does the job so that the code can be ported to the machine, but don't have to be fast, they will do everything that is easier for them.

Sometimes the compiler does not copy the string at all: if the compiler can say that you do not need a copy, there is no reason to make a copy. For example, if the compiler sees that you are simply passing the string to printf and not changing it at all, then the compiler gets the same result without making a copy, passing the original to printf .

+5
source

I'm not sure what you mean by your distinction between copy methods "character by character" and "solid string". A string is usually not a machine-level object, which means that it is not possible to copy it as a "whole string". How do you expect this to happen?

The string will always be copied "by character", at least conceptually. Now that it comes to copying extended memory areas, the copying process can be optimized by the compiler by performing phased (rather than byte byte) copying whenever possible. A similar optimization can be implemented at the processor microarchitecture level.

But in any case, in the general case, copying is implemented as an iterative process, and not as some kind of atomic operation in a "whole line".

In addition, the smart compiler can understand that in some cases copying is not required at all. For example, if your code does not modify the array object and does not rely on its address identifier, the compiler may simply decide to use the original string literal directly, without any copy whatsoever (i.e. basically quietly replace your char array[] = "string" with const char *array = "string" )

0
source

There is no reason to think that there is a copy at all.

Take for example the following code.

 int main() { char c[] = "hi"; } 

For me, this creates an (non-optimized) assembly:

 main: pushq %rbp movq %rsp, %rbp movw $26984, -16(%rbp) movb $0, -14(%rbp) movl $0, %eax popq %rbp ret 

The array memory is initialized by setting it to 26984. This value is represented by two bytes 0x68 and 0x69, which are the values โ€‹โ€‹of ascii 'h' and 'i'. There is no data segment representing the string at all, and the array is not initialized by copying anything in it character by character, or by any other smart way of copying.

Of course, this is only one compiler implementation (g ++ 4.8), and other compilers can do whatever they want, as long as they meet the language specification.

0
source

It depends on the compiler and the target architecture.

There may be very simple target architectures, such as microcontrollers, that do not have instructions for supporting copying memory blocks. There are probably very simple compilers for training that generate byte copy bytes even on architectures that support more efficient methods.

However, you can assume that compilers at the production level are a reasonable thing and create the fastest code for the most popular architectures in this case, and you do not need to worry about that.

However, the best way to check is to read the assembly by the compiler generates.

Take this test code (stack_array_init.c):

 #include <stdio.h> int main() { char a[]="Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed\n" "do eiusmod tempor incididunt ut labore et dolore magna aliqua.\n"; printf("%s", a); return 0; } 

And compile it into an assembly with gcc with optimization for size (to have less read), for example:

 gcc -Os -S stack_array_init.c 

Here is the result for x86-64:

  .file "stack_array_init.c" .section .rodata.str1.1,"aMS",@progbits,1 .LC1: .string "%s" .LC0: .string "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed\ndo eiusmod tempor incididunt ut labore et dolore magna aliqua.\n" .section .text.startup,"ax",@progbits .globl main .type main, @function main: .LFB0: .cfi_startproc subq $136, %rsp .cfi_def_cfa_offset 144 movl $.LC0, %esi movl $126, %ecx leaq 2(%rsp), %rdi xorl %eax, %eax rep movsb leaq 2(%rsp), %rsi movl $.LC1, %edi call printf xorl %eax, %eax addq $136, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Debian 4.7.2-5) 4.7.2" .section .note.GNU-stack,"",@progbits 

Here "rep movsb" is a command that copies a line onto the stack.

Here is an excerpt from the ARMv4 build (which might be easier to read):

 main: @ Function supports interworking. @ args = 0, pretend = 0, frame = 128 @ frame_needed = 0, uses_anonymous_args = 0 str lr, [sp, #-4]! sub sp, sp, #132 mov r2, #126 ldr r1, .L2 mov r0, sp bl memcpy mov r1, sp ldr r0, .L2+4 bl printf mov r0, #0 add sp, sp, #132 ldr lr, [sp], #4 bx lr .L3: .align 2 .L2: .word .LC0 .word .LC1 .size main, .-main .section .rodata.str1.4,"aMS",%progbits,1 .align 2 .LC1: .ascii "%s\000" .space 1 .LC0: .ascii "Lorem ipsum dolor sit amet, consectetur adipisicing" .ascii " elit, sed\012do eiusmod tempor incididunt ut labor" .ascii "e et dolore magna aliqua.\012\000" .ident "GCC: (Debian 4.6.3-14) 4.6.3" .section .note.GNU-stack,"",%progbits 

In my understanding of the ARM assembly, it looks like a memcpy call code to copy a string into a stack array. Although this does not show the assembly for memcpy, I would expect it to use one of the fastest methods available.

0
source

All Articles