Chrome: how 1.5 GB of source code is compressed into an 50 MB executable

It just puzzles me. I just downloaded a 1.5 GB archive of Chrome source code. The same code is compiled with an accuracy of 50 MB.

Why is there such a mismatch between the size of the source code and the size of the executable?

+4
source share
4 answers

A list of things that can cause this.

  • The executable file does not need spaces, comments or good formatting. The source code can have TONS documentation and spaces to make the code readable, and it all takes a space.

  • The source code can bring with it a lot of other code to test the application. But this test code never ends up in the final application.

  • Documentation included in the code. Depending on the format, .doc or .docx, the documentation can be huge.

  • Someone else mentioned that version control comments can also be in code. Icluding commit messages in the source code can also make files large.

  • I do not know how / when you compared the files, but if you did it AFTER compilation time, you could also include compilation artifacts (* .o files) in your calculation. Thus, you can understand that the source code is 1.5 GB when it really is only 750 MB (roughly speaking).

  • Depending on the compiler and how good it is, it can generate less build code and thereby create a smaller file. Although I think that most compilers are reasonable today, and this should not take into account too much variance of sizes. (but I may be wrong, I am not a compiler)

  • If an application is statically compiled with all libraries, it will be larger, because now it should contain its dependencies. However, if the libraries are dynamically linked / loaded, the executable itself may be significantly smaller, as it will simply reference the libraries at runtime and load them as needed.

Was there a 1.5 GB archive or 1.5 GB extended archive?

In any case, there can be many factors.

+9
source

An average of 1,621 bytes for copyright / licenses at the top of all source code files. Chrome (without any svn / git / object / image files) has 73,510 source files (for this I saved it to .cc, .h, .cpp, .idl, .m, .js, .c ,. ru )

These are 119159710 bytes of copyright notices only.

Or 116366 kilobytes

Or 133 megabytes. Just. in .. copyright notices ..

Worsening, Chromium found errors indicating that they might even violate their own license, as they mix several different tastes and versions of open (and some not so open) licenses. [1]

Sources:

[1] https://code.google.com/p/chromium/issues/detail?id=28291

[2] I work with the chromium source code:

Trevors-Mac: Search for src trevor $. -name "* .cc" | wc -l

15941

Trevors-Mac: Search for src trevor $. -name "* .h" | wc -l

26125

Trevors-Mac: Search for src trevor $. -name "* .cpp" | wc -l

5191 

Trevors-Mac: Search for src trevor $. -name "* .idl" | wc -l

  881 

Trevors-Mac: Search for src trevor $. -name "* .m" | wc -l

  258 

Trevors-Mac: Search for src trevor $. -name "* .js" | wc -l

13528

Trevors-Mac: Search for src trevor $. -name "* .c" | wc -l

 7856 

Trevors-Mac: Search for src trevor $. -name "* .py" | wc -l

 3988 

Trevors-Mac: src trevor $

+6
source

Well, let's put it this way: when you write an assembly, you can write MOV 0,eax (or something else, I really don't know the assembly), and it compiles for just a few bytes.

Higher-level languages ​​usually take up more space than their compiled machine code, because they must be understandable to humans. Another example: 2147483647 takes 10 bytes when writing in the source code, but only 4 when compiling.

+3
source

At least part of the answer is that many words and characters in the source code are only relevant to the compiler and not to the executable. For example, the keywords "public" and "private" tell the compiler a lot about what code is allowed to access which variables or other code, but at the level of the binary executable that runs on the CPU, there is no such thing. The processor simply accesses any memory it accessed.

+2
source

All Articles