Starting with the approval of the Cairnarvon CV:
"It seems that the main reason PyPI needs mirrors is because they are there."
I would change this a bit:
It may be more the way PyPI actually works and, therefore, should be mirrored, which may introduce an extra bit (or two :-) to the real traffic.
At the moment, I think you MUST interact with the main index to know what needs to be updated in your repository. State is not just accessible through timestamps in any public folder hierarchy. So bad, rsync got out of the equation. It's good that you CAN talk to the index through JSON, OAuth, XML-RPC, or HTTP interfaces.
For XML-RPC:
$> python >>> import xmlrpclib >>> import pprint >>> client = xmlrpclib.ServerProxy('http://pypi.python.org/pypi') >>> client.package_releases('PartitionSets') ['0.1.1']
For JSON for example:
$> curl https://pypi.python.org/pypi/PartitionSets/0.1.1/json
If approx. 30,000 packages are posted [ 1 ], some of which are downloaded from 50,000 to 300,000 times a week [ 2 ] (for example, distribute, pips, requests, paramiko, lxml, boto, paramike, redis and others) you really need mirrors, by at least in terms of accessibilty. Imagine what the user does when pip install NeedThisPackage fails: Wait? In addition, the company's widespread PyPI mirrors are quite common, acting as proxies for other inaccessible networks. Finally, do not forget that a wonderful check of several versions is included through virtualenv and friends. All this IMO is the legal and potentially wonderful use of packages ...
In the end, you never know what the agent really does with the downloaded package: keep in mind that N users really use it or just rewrite it next time ... and in the end, the authors of IMHO packages should take more care number and nature of use than the net number of potential users ;-)
Refs: Guest rooms from https://pypi.python.org/pypi (29303 packages) and http://pypi-ranking.info/week (for weekly rooms requested in 2013-03-23).