I found a Python script ( here: Wikipedia Extractor ) that can generate plain text from (in English) a Wikipedia dump database . When I use this command (as indicated on the script page):
$ python enwiki-latest-pages-articles.xml WikiExtractor.py -b 500K -o extracted
I get this error:
File "enwiki-latest-pages-articles.xml", line 1 <mediawiki xmlns = "http://www.mediawiki.org/xml/export-0.8/" xmlns: xsi = "http: //www.w3. org / 2001 / XMLSchema-instance "xsi: schemaLocation =" http://www.mediawiki.org/xml/export-0.8/http://www.mediawiki.org/xml/export-0.8.xsd "version =" 0.8 "xml: lang =" en ">
^ SyntaxError: invalid syntax
I am running a script using Python 2.7.6 and Cygwin on Windows 7.
Hopefully if someone already used this script or experience with Python can help me solve this error.
Thanks in advance!
python database xml shell wikipedia
Asim
source share