C: clarification on the translation unit

Question

C: clarification on the translation unit

If we have two .c files and a .h file: main.c sub.c sub.h , where

main.c

 #include "sub.h" ...

sub.c

 #include "sub.h" ...

we can compile the program with i)

 gcc -o a.out main.c sub.c

or ii)

 gcc -c main.c gcc -c sub.c gcc -o a.out main.o sub.o

In this case, is the preprocessor output one or two translation units ?

I am confused because: main.c includes sub.h , which means that the preprocessor outputs one compilation unit. On the other hand, before creating the executable file, two object files are created, main.o and sub.o , making me think that "two source files, therefore, two translation units."

Which part do I not understand? or where am I making mistakes?

+7

c translation-unit

Electrojunkie Jan 29 '17 at 23:38

source share

2 answers

Consider creating an executable file as a two-step process: firstly, each translation unit is compiled into an object file; let this compiler. Secondly, object files are associated with an executable program; let me call it a linker.

The "translation unit" is a matter of the first step. A translation unit is each file in which compilation begins (i.e., is passed to the compiler). Most IDEs have rules declaring that each file with a .c or .cpp extension is passed as input to the compiler, while there are no other files. Therefore, files with the extension .h , .hpp , .txt are usually not transferred directly to the compiler.

In your example, main.c and sub.c are probably translation units, while sub.h not a translation unit separately (it is included only in other translation units and is considered during compilation).

So, you get two object files, one for each translation unit. These two object files are then examined by the linker.

Note that even a .h file may contain a complete program; but if you do not configure the environment with which this .h file is compiled on its own, it will not generate an object file.

+6

Stephan lechner Jan 29 '17 at 23:56

source share

giusti · Accepted Answer · 2017-01-30T00:23:56+0000

Here's what the C standard has to say about it:

The source file, along with all the headers and source files included in the preprocessing directive #include , is known as the preprocessing translation unit. After pre-processing, a pre-processing translation unit is called a translation unit. [..] Previously translated translation units can be stored individually or in libraries. Individual program translation units bind (for example) function calls, whose identifiers have an external connection, manipulating objects, whose identifiers have an external connection, or manipulating data files. Translation units can be translated separately, and then associated with the creation of an executable program.

(Source: Draft Standard C99, 5.1.1.1 §1)

So, in both cases, you have two translation units. One of them comes from preprocessing the main.c compiler and everything that is included in the #include & mdash directives, that is, sub.h and, possibly, <stdio.h> and other headers. The second comes from a compiler that does the same with sub.c

The difference from your first to your second example is that in the latter you explicitly save “different translated translation units” as object files.

Note that there is no rule linking one object file with any number of translation units. The GNU component is one example of a linker that can combine two .o files .

The standard, as far as I know, does not indicate the extension of the source files. However, in practical aspects, you can use the #include a .c file for free in another or put your entire program in a .h file. With gcc you can use the -xc option to force the .h file to be considered the starting point of a translation unit.

The difference made here:

The source file, along with all the headers and source files included in the preprocessing directive #include [...]

is that the header should not be the source file. Similarly, the contents of <...> in the #include directive should not be a valid file name. How exactly the compiler uses named headers <...> and "..." is determined by the implementation.

C: clarification on the translation unit

More articles: