Why not concatenate C source files before compiling?

I proceed from the script, and the preprocessor in C always seemed to me ugly. However, I hugged him when I learned to write small programs in C. I really use a preprocessor to include standard libraries and header files that I wrote for my own functions.

My question is: why do C programmers just not skip all incoming and just combine the C source files and compile them? If you include all your applications in one place, you will only need to determine what you need, and not in all the source files.

Here is an example of what I am describing. Here I have three files:

// includes.c #include <stdio.h> 
 // main.c int main() { foo(); printf("world\n"); return 0; } 
 // foo.c void foo() { printf("Hello "); } 

By doing something like cat *.c > to_compile.c && gcc -o myprogram to_compile.c in my Makefile, I can reduce the amount of code I'm writing.

This means that I do not need to write a header file for each function I create (because they are already in the main source file), and also means that I do not need to include standard libraries in every file I create. This seems like a great idea to me!

However, I understand that C is a very mature programming language, and I imagine that someone else who is much smarter than me already had this idea and decided not to use it. Why not?

+74
c c-preprocessor compilation
Feb 09 '17 at 11:28
source share
10 answers

Some programs are structured this way.

A typical example is SQLite . It is sometimes compiled as amalgamation (executed at build time from many source files).

But this approach has pros and cons.

Obviously, compilation time will increase quite a lot. So this is practical only if you rarely compile this material.

Perhaps the compiler can optimize a bit. But when optimizing the connection time (for example, using the recent GCC, compiling and communicating with gcc -flto -O2 ) you can get the same effect (of course, by increasing the build time).

I do not need to write a header file for each function

This is the wrong approach (having one header file for each function). For a project with one person (less than one hundred thousand lines of code, otherwise KLOC = kilogram code ), it is quite reasonable - at least for small projects - to have one common header file (which you could precompile if you use GCC ), which will contain declarations of all public functions and types and, possibly, definitions of static inline functions (small enough and called often enough to profit from inlining ). For example, the sash shell is organized this way (as well as the lout formatter , with 52 KLOC).

Perhaps you also have several header files, and perhaps there is some single "grouping" header, which #include -s all of them (and which you can precompile). See, for example, jansson (in fact, there is one common header file) and GTK (which has many internal headers, but most applications using it have only one #include <gtk/gtk.h> , which in turn includes all internal headers). On the opposite side, POSIX has a large number of header files, and it documents which ones should be included and in what order.

Some people prefer to have many header files (and some even prefer to put one function declaration in their own header). I do not (for personal projects or small projects in which only two or three people will code), but it is a matter of taste . By the way, when a project grows a lot, it often happens that the set of header files (and translation units) changes significantly. Look also at REDIS (it has 139 .h header files and 214 .c files, i.e. translation units totaling 126 KLOC).

The presence of one or more translation units also depends on taste (and convenience, habits and conventions). My preference is to have source files (i.e. translation units) that are not too small, typically several thousand lines each, and often have (for a small project less than 60 KLOC) a common single header file. Do not forget to use the build automation tool, for example GNU make (often with parallel build through make -j , then you will have several compilation processes running at the same time). The advantage of organizing the source file this way is that compilation is fast enough. BTW, in some cases it is advisable to use metaprogramming : some of your (internal headers or translation units) C "source" files can (for example, some scripts in AWK , some specialized C programs, such as bison or your own thing).

Remember that C was designed in the 1970s, for computers that are much smaller and slower than your favorite laptop today (as a rule, at that time there was no more than a megabyte or even several hundred kilobytes in memory, and the computer was, by at least a thousand times slower than your mobile phone today).

I highly recommend exploring the source code and building some existing free software projects (for example, on GitHub or SourceForge or your favorite Linux distribution). You will learn that they are different approaches. Remember that in conventions and habits, C is of great importance in practice , so there are various ways to organize your project in .c and .h files . Read about => nofollow noreferrer β†’.

It also means that I don’t need to include standard libraries in every file I create.

You include header files, not libraries (but you must link libraries). But you can include them in each .c file (and many projects do this), or you can include them in one header and precompile this header, or you can have a dozen headers and include them after the system headers in each compilation unit. YMMV. Note that on modern computers, the preprocessing time is fast (at least when you ask the compiler optimizer to optimize, since optimization takes longer than parsing and preprocessing).

Note that what goes into some #include -d file is normal (and not defined by the C specification). Some programs have some code in some such file (which then should not be called a "header", but just an "included file" and which should not have the suffix .h , but something like .inc ). Look at the example on XPM files. On the other hand, you could basically not have any of your own header files (you still need the header files from the implementation, for example <stdio.h> or <dlfcn.h> from your POSIX system), and copy and paste duplicated code in their .c -eg files have the string int foo(void); in every .c file, but this is very bad practice and disapproving. However, some programs generate C files that share common content.

BTW, C or C ++ 14 do not have modules (e.g. OCaml). In other words, in C, a module is mainly a convention.

(note that having many thousands of very small .h and .c files in just a few tens of lines, everyone can drastically reduce the build time, having hundreds of files of several hundred lines each is more reasonable in terms of build time.)

If you start working on a project with one person in C, I would suggest that you first have one header file (and precompile it) and several translation units .c . In practice, you will modify .c files much more often than .h . If you have more than 10 KLOCs, you can reorganize this into multiple header files. Such refactoring is difficult to design, but easy to do (just copy and paste the code). Other people would have different suggestions and tips (and this is normal!). But be sure to include all warnings and debugging information when compiling (so compile with gcc -Wall -g , maybe set CFLAGS= -Wall -g in the Makefile ). Use the gdb debugger (and valgrind ...). Ask for optimization ( -O2 ) when you are comparing an already debugged program. Also use a version control system such as Git .

On the contrary, if you are developing a larger project that several people would work on, it would be better to have several files - even several header files - (intuitively, each file has one person responsible for it, others make an insignificant contribution to this file).

In the comment add:

I'm talking about writing code in many different files, but using Makefiles to concatenate them

I do not understand why this would be useful (except in very strange cases). It is much better (and very common and common practice) to compile each translation unit (for example, each .c file) into its own object file (a .o ELF file in Linux) and link them later. This is easy with make (in practice, when you change only one .c file, for example, to fix the error, only this file compiles and the incremental build is very fast), and you can ask it to compile the object files in parallel with make -j (and then your build goes very fast on your multi-core processor).

+105
Feb 09 '17 at 11:32
source share
β€” -

You could do this, but we like to split the programs into separate translation units, mainly because:

  • It speeds up assembly. You only need to rebuild the files that have been changed, and they can be linked with other compiled files to form the final program.

  • The C standard library consists of precompiled components. Do you really want to recompile it all?

  • It is easier to collaborate with other programmers if the code base is divided into different files.

+29
Feb 09 '17 at 11:32
source share
  • Thanks to its modularity, you can share your library without using code.
  • For large projects, if you change one file, you will end up compiling the complete project.
  • If you try to compile large projects, you may lose most of your memory.
  • You may have circular dependencies in the modules, modularity helps to preserve them.

There may be some advantages to your approach, but for languages ​​like C, compiling each module makes more sense.

+18
Feb 09 '17 at 11:32
source share

Because splitting things up is a good program design. A good program design is modularity, stand-alone code modules, and code reuse. As it turns out, common sense will take you very far in developing a program: things that do not belong to each other should not be placed together.

Placing unrelated code in different translation units means that you can maximize localize the scope of variables and functions.

Merging things creates a tight connection, which means uncomfortable dependencies between code files that really don't even need to know about each other. This is why the "global.h" that contains everything in the project is bad because it creates a tight connection between every unrelated file in the whole project.

Suppose you are writing firmware to drive a car. One module in the program controls the FM radio. Then you reuse the radio code in another project to control the FM radio on your smartphone. And then your radio code will not compile because it cannot find the brakes, wheels, gears, etc. Things that don't make the slightest sense for FM radio, not to mention the smartphone you need to know about.

Even worse, if you have a tight connection, errors grow throughout the program, and do not remain local in the module where the error is located. This makes the consequences of errors more serious. You write an error in the code of the FM radio, and then suddenly the car brake stops working. Even if you did not touch the brake code with an update containing an error.

If an error in one module completely violates unrelated things, this is almost certainly due to the poor design of the program. And a certain way to achieve poor program design is to combine everything in the project together into one big block.

+16
Feb 09 '17 at 12:23
source share

Your approach to concatenating .c files is completely broken:

  • Although the command cat *.c > to_compile.c will put all the functions in one file, the order is: you must declare each function before its first use.

    That is, you have dependencies between your .c files that force a certain order. If your concatenation command does not follow this order, you will not be able to compile the result.

    Also, if you have two functions that use each other recursively, there is absolutely no way to write a forward declaration for at least one of the two. You can also place these forward ads in the header file where people expect to find them.

  • When you merge everything into a single file, you force a complete rebuild every time one line in your project changes.

    When using the classic .c / .h compilation approach, a change in the function implementation requires recompilation of exactly one file, while a change in the header requires recompilation of the files that actually include this header. This can easily speed up recovery after a small change of 100 or more times (depending on the number of .c files).

  • You lose all the possibilities of parallel compilation when you merge everything into one file.

    Do you have a big, fat 12-core hyper-threading processor? Sorry, your combined source file is compiled in one thread. You just lost acceleration with a coefficient greater than 20 ... Well, this is an extreme example, but I already have software with make -j16 , and I tell you, this can make a huge difference.

  • Compilation time is usually not linear.

    Typically, compilers contain at least some algorithms that have quadratic behavior at runtime. Therefore, usually there is some threshold from which aggregate compilation is actually slower than compiling independent parts.

    Obviously, the exact location of this threshold depends on the compiler and the optimization flags that you pass to it, but I saw how the compiler takes half an hour in one huge source file. You do not want to have such an obstacle in your compilation change cycle.

Make no mistake. Despite all these problems, there are people who use the concatenation of the .c file in practice, and some C ++ programmers get almost everything to the same point, moving everything to the templates (so the implementation found in the .hpp file, and there is no associated .cpp file), allowing the preprocessor to concatenate. I do not see how they can ignore these problems, but they do.

Also note that many of these problems only occur with large project sizes. If your project is less than 5,000 lines of code, it is still relatively unimportant how you compile it. But when you have over 50,000 lines of code, you definitely need a build system that supports incremental and parallel builds. Otherwise, you are wasting your work time.

+15
Feb 10 '17 at 11:32
source share

Header files must define interfaces - this is the desired convention. They are not intended to declare everything contained in the corresponding .c file or group of .c files. Instead, they declare all the functions in the .c files available to their users. A well-designed .h file contains the base interface document, open source in the .c file, even if it does not have a single comment. One way to get closer to the design of the C module is to write the header file first and then implement it in one or more .c files.

Consequence: The functions and data structures internal to the implementation of the .c file are usually not related to the header file. You may need forward declarations, but they must be local, and all declared and defined variables and functions must be static : if they are not part of the interface, the linker should not see them.

+12
Feb 09 '17 at 22:07
source share

The main reason is compilation time. Compiling one small file while changing it may take a short time. If you compiled the whole project every time you change one line, then you collect, for example, 10,000 files each time, which can take a lot of time.

If you have - as in the example above - 10,000 source files, and compilation takes 10 ms, then the whole project builds incrementally (after changing one file) or (10 ms + link time) if you just compile this modified file or (10 ms * 10000 + short bind time) if you compile everything as a single concatenated blob.

+9
Feb 09 '17 at 11:31 on
source share

While you can still write your program in a modular way and create it as a single translation unit, you will skip all the C mechanisms that make this modularity possible . With multiple translation units, you have precise control on the module interfaces, using, for example, extern and static .

By combining your code into a single translation unit, you will skip any modularity problems that may arise because the compiler does not warn you about them. In a large project, this will ultimately lead to unintended dependencies spreading around. , .

+8
09 . '17 13:07
source share

, , , .

.h , , , . everything.h , .h . , pro .c .

, [...]

. . , con .

+5
09 . '17 21:49
source share

, ( ), , . !

, , , .

. , .

, , Java.

, , .

, :

 // ac static void utility() { } static void a_func() { utility(); } // bc static void b_func() { utility(); } 

:

 // ab.c static void utility(); #include "ac" #include "bc" 

ac , bc , ab.o ab.c .

ab.c ?

. ac bc , , , , extern .

+3
09 . '17 11:57
source share



All Articles