I have a project in which I collect all Wikipedia articles related to a certain category, pull out a dump from Wikipedia and put it in our db.
So, I have to parse the Wikipedia dump file to get the material. Do we have an efficient parser to do this job? I am a python developer. Therefore, I prefer any parser in python. If you donβt propose one, Iβll try to write a port for it in python and put it on the Internet, so that other people use it or at least try it.
So all I want is a python parser for parsing Wikipedia dump files. I started writing a parser that parses each node and receives material.
python xml parsing wikipedia wiki
None-da
source share