I managed to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (enchanted) methodology:
1) Use the following command to generate a list of packages with a fixed size and a list of dependencies:
dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends}
2) Parse the results and build a statistics map for each package:
struct PkgStats { PkgStats() : kbSize(0), dependantCount(0) {} int kbSize; int dependentCount; }; typedef std::map<std::string, PkgStats> PkgMap;
Where dependentCount is the number of other packages that directly depend on this package.
results
Here is a list of the 20 most popular packages on my system:
Package Installed KB
where Dup'd MB is the number of megabytes that would be duplicated if there were no shared Dup'd MB ( = installed_size * (dependants_count - 1) , for dependants_count > 1 ).
No wonder seeing libc6 on top. :) By the way, I have a typical installation of Ubuntu 9.10 with several installed packages related to programming, as well as some GIS tools.
Some statistics:
- Total installed packages: 1717
- Average direct dependents: 0.92
- Total duplicate size without sharing (excluding indirect dependencies): 10.25 GB
- Histogram of # direct dependents (note the logarithmic scale Y):

Note that the above completely ignores indirect dependencies (i.e., everything should be at least indirectly dependent on libc6). What I really had to do was build a graph of all the dependencies and use it as the basis for my statistics. Maybe I will someday take care of this and publish a long blog article with more details and rigor.
source share