I have a very large XML file (more precisely, 20GB, and yes, I need all this). When I try to upload a file, I get this error:
Python(23358) malloc: *** mmap(size=140736680968192) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Traceback (most recent call last): File "file.py", line 5, in <module> code = xml.read() MemoryError
This is the current code I have for reading an XML file:
from bs4 import BeautifulSoup xml = open('pages_full.xml', 'r') code = xml.read() xml.close() soup = BeautifulSoup(code)
Now, how am I going to fix this error and continue working on the script. I would try to split the file into separate files, but since I do not know how this will affect BeautifulSoup, as well as the XML data, I would prefer not to.
(XML data is a database dump from a wiki that I drive to, using it to import data from different time periods, using direct information from many pages)
python xml mediawiki beautifulsoup
Hairr
source share