Reading PDF metadata in PHP

Question

Reading PDF metadata in PHP

I am trying to read metadata attached to arbitrary PDF files: title, author, subject and keywords.

Is there a PHP library, preferably open source, that can read PDF metadata? If so, or if not, how can I use the library (or lack thereof) to retrieve metadata?

To be clear, I'm not interested in creating or modifying PDF files or their metadata, and I don't need PDF bodies. I looked at several libraries, including FPDF (which everyone seems to recommend), but it seems to be intended only for creating PDFs and not for extracting metadata.

+7

php pdf metadata

user113292 Dec 20 '10 at 19:37

source share

6 answers

I don’t know about libraries, but a simple way to achieve the same result can help in opening the file and parsing everything that happens after the last "end".

Try to open the pdf file in a text editor, the parser should not accept more than five lines.

+6

cbrandolino Dec 20 '10 at 19:45

source share

PDF Parser does exactly what you want, and it's pretty simple:

 $parser = new \Smalot\PdfParser\Parser(); $pdf = $parser->parseFile('document.pdf'); $text = $pdf->getDetails();

You can try it on the demo page.

+4

Alessandro cosentino Mar 27 '14 at 10:41

source share

I was looking for the same thing today. And I came across a small PHP class at http://de77.com/ that offers a quick and dirty solution. You can load the class directly. The output is encoded by UTF-8.

The creator says:

Here is what I wrote in a PHP class that you can use to get the title, author, and page count of any PDF file. It does not use any external application - just pure PHP.

 // basic example include 'PDFInfo.php'; $p = new PDFInfo; $p->load('file.pdf'); echo $p->author; echo $p->title; echo $p->pages;

It works for me! All thanks exclusively to the creator of the class ... well, maybe just a little thanks to me too for finding the class;)

+3

maxpower9000 Mar 6 '13 at 10:48

source share

You can use PDFtk to extract the number of pages:

 // Windows $bin = realpath('C:\\pdftk\\bin\\pdftk.exe'); $cmd = "cmd /c {$bin} {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*//'"; // Unix $cmd = "pdftk {$path} dump_data | grep NumberOfPages | sed 's/[^0-9]*

If ImageMagick is available, you can also use:

 $cmd = "identify -format %n {$path}";

Run in PHP via shell_exec () :

 $res = shell_exec($cmd);

+1

maxpower9000 Jan 9 '17 at 10:29

source share

 <?php $sourcefile = "file path"; $stringedPDF = file_get_contents($sourcefile, true); preg_match('/(?<=Title )\S(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))./', $stringedPDF, $title); echo $all = $title[0];

0

ved uniyalas Aug 3 '17 at 8:26

source share

user113292 · Accepted Answer · 2010-12-23T16:44:42+0000

The Zend structure includes Zend_Pdf , which makes it very simple:

$pdf = Zend_Pdf::load($pdfPath); echo $pdf->properties['Title'] . "\n"; echo $pdf->properties['Author'] . "\n";

Limitations: works only with files without encryption less than 16 MB.

Reading PDF metadata in PHP

More articles: