Wikipedia article titles (no content)

I am doing a project for which I need to know all the names of wikipedia articles (I do not need content). Is there a place where I can download this data.

+6
web-scraping wikipedia
source share
2 answers

Check out this page here on Wikipedia - you can simply download the archive with article titles. Here is the actual path to the download page :

  • All titles (gzipped) - 32+ Mb at the time of publication.

Edit:

You may notice that the list contains non-English headlines (and some profanity) contained in enwiki-latest-all-titles-in-ns0.gz . This is due to the fact that by default, most people create content on the main English wiki (language code en ). If you explored other language dumps, you'll see that there are different sets of articles.

Reading on the main download page , there are links to the possibility of using the Wikipedia API to perform some types of requests on Wikipedia, but I'm not sure if this will solve your problem (taxonomy of the pages does not seem to provide an easy way to distinguish "English" content from "content on English wiki ").

+14
source share

I don’t know a single central list of articles, but if you just need a large number of them, and not a complete list (bearing in mind that any complete list will always be obsolete), you could probably put something along with wget to recursively follow Wikipedia links from the home page and store the URLs you get.

0
source share

All Articles