Update your PDF to include an encrypted, hidden, unique identifier?

Background

The idea is this:

  • A person provides contact information for the purchase of online books.
  • The book, as a PDF, is marked with a unique hash
  • Book of reviews for people
  • PDF passwords are easy to get around or share

An ideal process would be something like this:

  • Create a hash based on contact information
  • Save contact information and hash in database
  • Acquire Book Lock
  • Refresh hash file with hash text
  • Create a PDF book (using pdflatex )
  • Apply hash to book
  • Book Release Lock
  • Send a letter with a book.

Technologies

You can use the following technologies (other programming languages ​​are possible, but libraries are likely to be limited to those provided by the host):

  • C, Java, PHP
  • LaTeX Files
  • PDF files
  • Linux

Question

What programming methods (or open source software) should be explored at:

  • Insert a unique hash (or other character) into a PDF
  • Create a collusion resistant label
  • Non-fragile development (e.g. PDF -> EPS -> PDF still contains a label) solution

Study

I reviewed the following options:

  • steganography
  • Natural Language Processing (NLP)
  • Convert blank pages in PDF to images; mark these images; compile pdf
  • LaTeX Watermark Pack
  • Imagemagick

Questions

The possible solutions I investigated have the following problems:

  • Steganography. (a) A master copy of images that are converted to EPS, which are intense and time-consuming, is required; (b) whether the watermark PDF -> EPS -> PDF or other types of conversion will be preserved; (c) most of the images are images or screenshots, not PNG photos.
  • LaTeX. Creates an image cache; any steganographic solution must somehow intercept this process.
  • NLP introduces grammatical errors; may change the meaning of technical words.
  • Blank pages. Suspect immediately; easily replace suspicious blank pages.
  • Watermark package . Draws visible labels.
  • ImageMagick. Draws visible labels.

What other solutions are possible?

Related Links

Thanks!

+8
pdflatex watermark steganography
source share
1 answer

I did this for another project with PDFlib . We need traceability for the generated PDF files in the event of a file leak. Mostly:

  • An initial PDF template has been created with the content in place, set the main password of the document with the necessary parameters (without editing, without printing, without screen, etc.)
  • At run time, we applied a few watermarks (a page footer that said: β€œThis document has been issued to user No. 12345”, set several metadata fields with a user ID, upload an IP address, upload a date / time, add β€œthis document about copyright ... "title page, etc.)
  • If necessary, add a user password to force a PW request when opening a document.

Since the latest versions of PDF use AES-128 to encrypt them, we simply set a suitable random generated password with a high entropy of 128char - no one will ever type it manually, so hard typing is not related to us and is actually preferable. The master password did not allow end users to make any changes to the document. Various options for reading the noprint / no screen are actually used by the PDF reader and, therefore, are not available, but cannot damage their setting.

The disadvantage of this is that PDFlib licensing is pretty cool. I don’t know if any of the free php PDF libraries support the latest PDF encryption schemes, especially the master password, but if the budget can support it, PDFlib is the way to securely create a document.

+2
source share

Source: https://habr.com/ru/post/650453/


All Articles