MediaWiki stores file data in two or three places, depending on how you calculate:
Actual metadata for current file versions is stored in the image table. This is probably what you primarily want; You will find the latest en.wikipedia help here .
Data for old revised file changes is moved to the oldimage table, which has basically the same structure as the image table. This table is also reset, the last one here .
Finally, each file also (usually) corresponds to a fairly ordinary regular wiki page in the namespace 6 ( File: . You will find them in XML dumps, as with any other page.
Oh, and the reason you donβt find the files that you linked to in the Wikipedia dumps in English is because they are from a shared repository in Wikimedia Commons. Instead, you will find them in the Commons data archives .
As for downloading actual files, here is (apparently) the official documentation. As far as I can tell, all that they mean is "mass download at the moment (as of September 2012), available from mirrors, but not offered directly from Wikimedia servers." the fact is that if you want all the images in tarball, you have to use a mirror. If you are only pulling out a relatively small number of millions of images on Wikipedia and / or Commons, it should be good to use Wikimedia servers directly.
Do not forget to show the main courtesy: send the user-agent string, identifying yourself and do not hit the servers too hard. In particular, I would recommend downloading downloads sequentially so that you only download the next file after you have finished the previous one. Not only is this easier to implement than parallel loading in any case, but it ensures that you will not be afraid more than your share of bandwidth, and allows the download speed to more or less automatically adapt to server loading.
Ps. Whether you are downloading files from a mirror or directly from Wikimedia servers, you will need to find out in which directory they are located. Typical Wikipedia file URLs are as follows:
http://upload.wikimedia.org/wikipedia/en/a/ab/File_name.jpg
where the β wikipedia/en β part identifies the Wikimedia project and language (for historical reasons, Commons is specified as β wikipedia/commons β) and the β a/ab β part is given by the first two hexadecimal digits of the MD5 hash of the file name in UTF-8 (since they are encoded in database dumps).
source share