Look for an example or documentation for wikidump python lib

I stumbled upon a python wikidump library which, it seems to me, suits me simply.

I could get by looking at the source code, but I'm new to python and I don't want to write BS code, since the project I need is very important to me.

I got the wiki-SPECIFICDATE-pages-articles.xml.bz2 file and I will need to use it as a source to retrieve a single article. Can someone give me some pointers on how to achieve this correctly or, even better, point to some documentation? I could not find anyone!

(ps if you have a better and correct doc'd lib, please tell me)

+6
source share
1 answer

Not sure I understand the question, but if you have a Wikipedia dump and you need to analyze wikicode, I would suggest mwparserfromhell lib.

Another powerful infrastructure is Pywikibot , which is the historical basis for bot users on Wikipedia (thus, it has many scripts about writing pages, instead of reading and parsing articles). It has a lot of documentation (although sometimes outdated), and it uses the MediaWiki API.

You can use both of them, of course: PWB for extracting articles and mwparserfromhell for parsing.

0
source

All Articles