How can I extract javascript from a pdf file using a command line tool?

How can I extract a javascript object from a pdf file using a command line tool?

I am trying to create a GUI using Python with this function.

I found these two modules, but could not start them: pyPdf2 and pyPdf.

+4
source share
1 answer

When you work with JavaScript in PDF files, you should be aware of two cases (which you cannot distinguish in advance before carefully studying this file).

  • Harmless JavaScript
  • Malicious JavaScript

Case 1: Harmless, Useful, and Open JavaScript

OP JavaScript PDF PlanetPDF:

. pdfinfo -js ( , Poppler - pdfinfo XPDF -js!)

:

$ pdfinfo -js ppjslc_commonex_3.pdf

 Title:          Planet PDF JavaScript Learning Center Example #2
 Author:         Chris Dahl, ARTS PDF Global Services
 Creator:        PScript5.dll Version 5.2.2
 Producer:       Acrobat Distiller 6.0.1 (Windows)
 CreationDate:   Thu Oct 28 18:13:38 2004
 ModDate:        Thu Oct 28 18:17:46 2004
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           AcroForm
 JavaScript:     yes
 Pages:          1
 Encrypted:      no
 Page size:      612 x 792 pts (letter)
 Page rot:       0
 File size:      84720 bytes
 Optimized:      no
 PDF version:    1.5

 Name Dictionary "docOpened":
 // variable to store whether document has been opened already or not
 var bAlreadyOpened;

 function docOpened()
 {

    if(bAlreadyOpened != "true")
    {
        // document has just been opened
        var d = new Date();
        var sDate = util.printd("mm/dd/yyyy", d);

                 // set date now
                 app.alert("About to insert date into field now");
        this.getField("todaysDate").value = sDate;

        // now set bAlreadyOpened to true so it doesn’t
        // run again
 bAlreadyOpened = "true";
    }
    else
    {
        // document has already been opened
    }
 }

 // call the docOpened() function
 docOpened();

, -js JavaScript PDF <stdout>.

JavaScript, , , , , .

2: , , JavaScript

PDF , JavaScripts, , , , , "", , .

JavaScripts .

, , JavaScript, "clear" /JavaScript /JS PDF. , PDF , .

:

/#4Aava#53cript
/J#61vaScrip#74
/#4a#61#76#61#53#63#72#69#70#74
[...]

, , "" PDF. PDF ASCII ( char).

/JavaScript PDF (, grep -a).

, :

, ( ) PDF ( JavaScript, ).

, pdfid.py PDF :

  • - JavaScript, pdfid.py:

    $ pdfid.py nojavascript.pdf
    
     PDFiD 0.2.1  nojavascript.pdf
      PDF Header: %PDF-1.5
      obj                  193
      endobj               193
      stream                54
      endstream             54
      xref                   1
      trailer                1
      startxref              1
      /Page                  1
      /Encrypt               0
      /ObjStm                0
      /JS                    0 
      /JavaScript            0
      /AA                   12
      /OpenAction            0
      /AcroForm              1
      /JBIG2Decode           0
      /RichMedia             0
      /Launch                0
      /EmbeddedFile          0
      /XFA                   0
      /Colors > 2^24         0
    
  • JavaScript, /JavaScript PDF:

    $ pdfid.py javascript1.pdf | grep -E '(/JS|/JavaScript)
    
      /JS                   30
      /JavaScript           30
    
  • JavaScript, /JavaScript /JS :

    $ pdfid.py javascript2.pdf | grep -E '(/JS|/JavaScript)
    
      /JS                   30(30)
      /JavaScript           30(30)
    

    , pdfid.py , , . 30 30 /JavaScript - PDF , . "" PDF ( ) ...


Update

( ) :

peepdf.py, JavaScript. Python ( ) PDF, PDF .

extract, JavaScripts, PDF:

:

  • GitHub:
    git clone https://github.com/jesparza/peepdf.git git.peepdf
  • ( $PATH) script:
    cd git.peepdf ;
    ln -s $(pwd)/peepdf.py ${HOME}/bin/peepdf.py
  • script PeePDF, javascript:
    echo 'extract js > all-javascripts-from-my.pdf' > xtract.txt
  • PeePDF ( , -l , -f) , script, -s:
    peepdf.py -l -f -s xtract.txt my.pdf
  • JavaScript:
    cat all-javascripts-from-my.pdf
+10

All Articles