How much disk space is shared by the libraries that are really stored in modern Linux distributions?

In the debate on static vs shared libraries, I often heard that shared libraries eliminate duplication and reduce overall disk space. But how much disk space does the libraries actually keep in modern Linux distributions? How much more space would be needed if all the programs were compiled using static libraries? Has anyone crunched the numbers for a typical desktop Linux distribution like Ubuntu? Are any statistics available?

ADDITION:

All answers were informative and evaluated, but they seemed to remove my question, rather than trying to answer it. Caleb was on the right track, but he decided to crunch a number for memory space instead of disk space (my question was for disk space).

Since programs only "pay" for parts of the used static libraries, it seems almost impossible to quantify that the difference between the disk space will be for all static vs all common.

It seems to me that now I figured out my question when I realized that it was almost impossible to answer. But I will leave it here to keep informative answers.

So, so that SO stops imposing my choice of answer, I'm going to choose the most popular one (even if this circumvents the question).

+4
source share
5 answers

I'm not sure where you heard this, but the reduced disk space is basically a red herring as the disk space approaches pennies per gigabyte. The real gain in shared libraries includes security updates and bug fixes for these libraries; applications using static libraries must be individually rebuilt with new libraries, while all applications using shared libraries can be updated immediately by replacing only a few files.

+8
source

Not only shared libraries save disk space, but also save memory, and this is much more important. The prelinking steps are important here ... you cannot exchange pages of memory between two instances of the same library if they are not loaded with the same address, and the prelinking allows this to happen.

+6
source

Shared libraries do not necessarily conserve disk space or memory.

When an application refers to a static library, only those parts of the library that the application uses will be inserted into the application binary. The library archive (.a) contains object files (.o), and if they are well taken into account, the application will use less memory, linking only to the object files that it uses. Shared libraries will contain the entire library on disk and in memory, whether parts of it will be used by applications or not.

For desktop and server systems, this is unlikely to lead to a win in general, but if you are developing embedded applications, it is worth trying static links to all applications to see if it gives you overall savings.

+4
source

Well, maybe this is not the answer, but saving memory is what I consider. The savings will be based on the number of loaded libraries after the first application, so let's find out how many savings per library falls on the system using a quick script:

#!/bin/sh lastlib="" let -i cnt=1 let -i size=0 lsof | grep 'lib.*\.so$' | awk '{print $9}' | sort | while read lib ; do if [ "$lastlib" == "$lib" ] ; then let -i cnt="$cnt + 1" else let -i size="`ls -l $lib | awk '{print $5}'`" let -i savings="($cnt - 1) * $size" echo "$lastlib: $savings" let -i cnt=1 fi lastlib="$lib" done 

This will give us savings on lib, as such:

 ... /usr/lib64/qt4/plugins/crypto/libqca-ossl.so: 0 /usr/lib64/qt4/plugins/imageformats/libqgif.so: 540640 /usr/lib64/qt4/plugins/imageformats/libqico.so: 791200 ... 

Then the total savings:

 $ ./checker.sh | awk '{total = total + $2}END{print total}' 263160760 

So, roughly speaking, on my system I am saving about 250 megabytes of memory. Your mileage will be different.

+2
source

I managed to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (enchanted) methodology:

1) Use the following command to generate a list of packages with a fixed size and a list of dependencies:

 dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends} 

2) Parse the results and build a statistics map for each package:

 struct PkgStats { PkgStats() : kbSize(0), dependantCount(0) {} int kbSize; int dependentCount; }; typedef std::map<std::string, PkgStats> PkgMap; 

Where dependentCount is the number of other packages that directly depend on this package.

results

Here is a list of the 20 most popular packages on my system:

 Package Installed KB # Deps Dup'd MB libc6 10096 750 7385 python 624 112 68 libatk1.0-0 200 92 18 perl 18852 48 865 gconf2 248 34 8 debconf 988 23 21 libasound2 1428 19 25 defoma 564 18 9 libart-2.0-2 164 14 2 libavahi-client3 160 14 2 libbz2-1.0 128 12 1 openoffice.org-core 124908 11 1220 gcc-4.4-base 168 10 1 libbonobo2-0 916 10 8 cli-common 336 8 2 coreutils 12928 8 88 erlang-base 6708 8 46 libbluetooth3 200 8 1 dictionaries-common 1016 7 6 

where Dup'd MB is the number of megabytes that would be duplicated if there were no shared Dup'd MB ( = installed_size * (dependants_count - 1) , for dependants_count > 1 ).

No wonder seeing libc6 on top. :) By the way, I have a typical installation of Ubuntu 9.10 with several installed packages related to programming, as well as some GIS tools.

Some statistics:

  • Total installed packages: 1717
  • Average direct dependents: 0.92
  • Total duplicate size without sharing (excluding indirect dependencies): 10.25 GB
  • Histogram of # direct dependents (note the logarithmic scale Y): Histogram

Note that the above completely ignores indirect dependencies (i.e., everything should be at least indirectly dependent on libc6). What I really had to do was build a graph of all the dependencies and use it as the basis for my statistics. Maybe I will someday take care of this and publish a long blog article with more details and rigor.

+2
source

All Articles