How to display pdf that was downloaded in python

I took a pdf file from the Internet, using, for example,

import requests pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf") 

I would like to modify this code to display it

 from gi.repository import Poppler, Gtk def draw(widget, surface): page.render(surface) document = Poppler.Document.new_from_file("file:///home/me/some.pdf", None) page = document.get_page(0) window = Gtk.Window(title="Hello World") window.connect("delete-event", Gtk.main_quit) window.connect("draw", draw) window.set_app_paintable(True) window.show_all() Gtk.main() 

How do I change the document = line to use a pdf variable containing pdf?

(I don't mind using popplerqt4 or anything else if that makes it easier.)

+7
python pdf poppler pdf-rendering
source share
6 answers

It all depends on the OS you are using. This can usually help:

 import os os.system('my_pdf.pdf') 

or

 os.startfile('path_to_pdf.pdf') 

or

 import webbrowser webbrowser.open(r'file:///my_pdf.pdf') 
+2
source share

How to use a temporary file?

 import tempfile import urllib import urlparse import requests from gi.repository import Poppler, Gtk pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf") with tempfile.NamedTemporaryFile() as pdf_contents: pdf_contents.file.write(pdf) file_url = urlparse.urljoin( 'file:', urllib.pathname2url(pdf_contents.name)) document = Poppler.Document.new_from_file(file_url, None) 
0
source share

Try this and tell me if it works:

 document = Poppler.Document.new_from_data(str(pdf.content),len(repr(pdf.content)),None) 
0
source share

If you want to open pdf using Acrobat Reader, then the code below should work

 import subprocess process = subprocess.Popen(['<here path to acrobat.exe>', '/A', 'page=1', '<here path to pdf>'], shell=False, stdout=subprocess.PIPE) process.wait() 
0
source share

Since there is a library called pyPdf, you can download a PDF file with this. If you have further questions, send me a messege.

0
source share

August 2015: under new lighting in Windows 7, the problem remains the same:

 Poppler.Document.new_from_data(data, len(data), None) 

returns: Type error: there should be strings, not bytes.

 Poppler.Document.new_from_data(str(data), len(data), None) 

returns: the PDF document is corrupted (4).

I could not use this function.

I tried using NamedTemporayFile instead of a file on disk, but for some unknown reason it returns an unknown error.
Therefore, I am using a temporary file. Not the most beautiful way, but it works.

Here is the test code for Python 3.4, if anyone has an idea:

 from gi.repository import Poppler import tempfile, urllib from urllib.parse import urlparse from urllib.request import urljoin testfile = "d:/Mes Documents/en cours/PdfBooklet3/tempfiles/preview.pdf" document = Poppler.Document.new_from_file("file:///" + testfile, None) # Works fine page = document.get_page(0) print(page) # OK f1 = open(testfile, "rb") data1 = f1.read() f1.close() data2 = "".join(map(chr, data1)) # converts bytes to string print(len(data1)) document = Poppler.Document.new_from_data(data2, len(data2), None) page = document.get_page(0) # returns None print(page) pdftempfile = tempfile.NamedTemporaryFile() pdftempfile.write(data1) file_url = urllib.parse.urljoin('file:', urllib.request.pathname2url(pdftempfile.name)) print( file_url) pdftempfile.seek(0) document = Poppler.Document.new_from_file(file_url, None) # unknown error 
0
source share

All Articles