When processing a PDF file (2.pdf) using pdfminer (pdf2txt.py) I got the following error:
pdf2txt.py 2.pdf Traceback (most recent call last): File "/usr/local/bin/pdf2txt.py", line 115, in <module> if __name__ == '__main__': sys.exit(main(sys.argv)) File "/usr/local/bin/pdf2txt.py", line 109, in main interpreter.process_page(page) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_page self.render_contents(page.resources, page.contents, ctm=ctm) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 843, in render_contents self.init_resources(resources) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 347, in init_resources self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 195, in get_font font = self.get_font(None, subspec) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 186, in get_font font = PDFCIDFont(self, spec) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 654, in __init__ StringIO(self.fontfile.get_data())) File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 375, in __init__ (name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16)) struct.error: unpack requires a string argument of length 16
While a similar file (1.pdf) does not cause a problem.
I can not find any error information. I added issue to the pdfminer GitHub repository, but it went unanswered. Can someone explain to me why this is happening? What can I do for parsing 2.pdf ?
Update . I get a similar error with BytesIO instead of StringIO after installing pdfminer directly from the GitHub repository.
$ pdf2txt.py 2.pdf Traceback (most recent call last): File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 116, in <module> if __name__ == '__main__': sys.exit(main(sys.argv)) File "/home/danil/projects/python/pdfminer-source/env/bin/pdf2txt.py", line 110, in main interpreter.process_page(page) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 839, in process_page self.render_contents(page.resources, page.contents, ctm=ctm) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 850, in render_contents self.init_resources(resources) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 356, in init_resources self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 204, in get_font font = self.get_font(None, subspec) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_font font = PDFCIDFont(self, spec) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 665, in __init__ BytesIO(self.fontfile.get_data())) File "/home/danil/projects/python/pdfminer-source/env/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 386, in __init__ (name, tsum, offset, length) = struct.unpack('>4sLLL', fp.read(16)) struct.error: unpack requires a string argument of length 16