Technically, how do variational functions work? How does printf work?

I know that I can use va_arg to write my own variational functions, but how do variational functions work under the hood, that is, at the assembly instruction level?

For example, how is it possible that printf accepts a variable number of arguments?




<sub> * There is no rule without exception. There is no C / C ++ language, however, this question can be answered for both of them Sub>

<sub> * Note: The answer is initially set. How can the printf function accept variable parameters by number when they are output? but it doesn't seem to apply to the questioner Sub>

+54
c ++ c variadic-functions
Apr 16 '14 at 9:00
source share
2 answers

Standards C and C ++ do not impose any requirements for its work. The compiler-compiler may decide to create chained lists, std::stack<boost::any> or even magic dust from a pony (according to @Xeo's comment) under the hood.

However, this is usually implemented as follows, even if transformations, such as embedding or passing arguments in CPU registers, may not leave any of the code discussed.

Also note that this answer specifically describes the stack growing down in the figures below; In addition, this answer is a simplification to demonstrate the scheme (see https://en.wikipedia.org/wiki/Stack_frame ).

How to call a function with a fixed number of arguments

This is possible because the underlying architecture of the machine has a so-called "stack" for each thread. The stack is used to pass arguments to functions. For example, when you have:

 foobar("%d%d%d", 3,2,1); 

Then it is compiled into assembler code like this (example and schematic, real code may look different); note that the arguments are passed from right to left:

 push 1 push 2 push 3 push "%d%d%d" call foobar 

These push operations populate the stack:

  [] // empty stack ------------------------------- push 1: [1] ------------------------------- push 2: [1] [2] ------------------------------- push 3: [1] [2] [3] // there is now 1, 2, 3 in the stack ------------------------------- push "%d%d%d":[1] [2] [3] ["%d%d%d"] ------------------------------- call foobar ... // foobar uses the same stack! 

The bottom element of the stack is called Top Stack, often abbreviated as TOS.

The foobar function will now access the stack, starting with TOS, that is, a format string, which, as you recall, was passed last. Imagine that stack is your stack pointer, stack[0] is the value in TOS, stack[1] is one over TOS, and so on:

 format_string <- stack[0] 

... and then parses the format string. During parsing, it recognizes %d tokens and for each, loads another value from the stack:

 format_string <- stack[0] offset <- 1 while (parsing): token = tokenize_one_more(format_string) if (needs_integer (token)): value <- stack[offset] offset = offset + 1 ... 

This, of course, is a very incomplete pseudo-code that demonstrates how a function should rely on passed arguments to find out how much it should load and remove from the stack.

Security

This dependency on user-provided arguments is also one of the biggest security concerns (see https://cwe.mitre.org/top25/ ). Users can easily mistakenly use a function with a variable number of arguments, either because they have not read the documentation, or have forgotten to set up a format string or argument list, or because they are simply evil, or something like that. See also Formatting attack on strings .

Implementation C

In C and C ++, variadic functions are used in conjunction with the va_list interface. Although insertion on the stack is an integral part of these languages ​​( in K + RC you can even declare a function forward without specifying its arguments , but at the same time call it with any number and any arguments), reading from this list of unknown arguments is conjugated via va_... -macros and va_list -type, which mostly abstract low-level stack frame access.

+68
Apr 16 '14 at 9:00
source share
— -

Variadic functions are defined by the standard, with very few explicit restrictions. Here is an example taken from cplusplus.com.

 /* va_start example */ #include <stdio.h> /* printf */ #include <stdarg.h> /* va_list, va_start, va_arg, va_end */ void PrintFloats (int n, ...) { int i; double val; printf ("Printing floats:"); va_list vl; va_start(vl,n); for (i=0;i<n;i++) { val=va_arg(vl,double); printf (" [%.2f]",val); } va_end(vl); printf ("\n"); } int main () { PrintFloats (3,3.14159,2.71828,1.41421); return 0; } 

Assumptions are approximately the following.

  • There must be (at least one) first, fixed named argument. ... actually does nothing, except the compiler must do the right thing.
  • A fixed argument provides information about how many variational arguments are there by an undefined mechanism.
  • From a fixed argument, the va_start macro can return an object that allows you to retrieve the arguments. Type va_list .
  • From the va_list object, you can va_arg iterate over each variational argument and force its value to a compatible type.
  • Something strange could happen in va_start , so va_end is doing everything right again.

In the most common stack-based situation, va_list is just a pointer to the arguments on the stack, and va_arg increments the pointer, discards it and casts it to the value. Then va_start initializes this pointer with some simple arithmetic (and inside knowledge) and va_end does nothing. There is no strange assembly language, but only some internal knowledge about where things are on the stack. Read the macros in the standard headers to find out what it is.

Some compilers (MSVCs) will require a specific sequence of calls in which the caller will free the stack, not the callee.

Functions like printf do just that. A fixed argument is a format string that allows you to calculate the number of arguments.

Functions of type vsprintf pass the va_list object as a regular argument type.

If you need more detailed information about the lower level, add to the question.

+5
May 18 '14 at 5:50
source share



All Articles