Remove byte byte mark from signed PDF file?

Question

Remove byte byte mark from signed PDF file?

I use iTextSharp 5.5.1 in order to sign PDF files digitally with a separate signature (received from a third-party authority). It seems that everything is working fine, the file is valid and, for example, Adobe Reader does not report problems, displays signatures as valid, etc.

The problem is that Java clients apparently have some problems with these files - the file cannot be opened or parsed. Files have a byte byte character in the header, which apparently causes the behavior (\ x00EF \ x00BB \ x00BF).

I could define the specification as follows:

PdfReader reader = new PdfReader(path); byte[] metadata = reader.Metadata; // metadata[0], metadata[1], metadata[2] contain the BOM

How can I either delete the specification (without losing the validity of the signature) or cause the iTextSharp library not to add these bytes to the files?

+7

c # pdf byte-order-mark itextsharp digital-signature

lukasz Oct 9 '14 at 13:59

source share

2 answers

Just a quick approach:

First: save both files not encrypted. Second: delete the metadata 0 to 2 before saving the file

However, there are some considerations: does the specification require a signature method? Is a specification required for the encryption method?

You also need to find out at what stage the specification is added before you can determine if you can / remove the specification.

I will quickly hunt for my documents in pdf format and see what I can get, however the easiest way (incomplete) is to load it all as an array of bytes and just delete the xEF xBB xBF from the very beginning, then do any signing / encryption. However, they can add it again ...

I will post an update over the weekend :)

+1

GMasucci Nov 07 '14 at 11:25

source share

Bruno lowagie · Accepted Answer · 2014-11-07T15:57:14+0000

First of all: after signing the PDF code, you should not change any bytes of this PDF document, because you invalidate it.

Second observation: the byte byte character is not part of the PDF header (PDF always starts with %PDF-1. ). In this context, this is the value of the begin attribute in the XMP metadata processing instruction. I do not know any Java client that has a problem with this sequence of bytes anywhere in the file. If they have a problem with this, there is a problem with this client, not the file.

A byte indicates the presence of UTF-8 characters. In the context of XMP, we have a stream inside a PDF that contains an XML text file with clear text that can be used by software that is not “known in PDF format”. For example:

 2 0 obj <</Type/Metadata/Subtype/XML/Length 3492>>stream <?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" dc:format="application/pdf" pdf:Keywords="Metadata, iText, PDF" pdf:Producer="iTextÂ® 5.5.4-SNAPSHOT Â©2000-2014 iText Group NV (AGPL-version); modified using iTextÂ® 5.5.4-SNAPSHOT Â©2000-2014 iText Group NV (AGPL-version)" xmp:CreateDate="2014-11-07T16:36:55+01:00" xmp:CreatorTool="My program using iText" xmp:ModifyDate="2014-11-07T16:36:56+01:00" xmp:MetadataDate="2014-11-07T16:36:56+01:00"> <dc:description> <rdf:Alt> <rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li> </rdf:Alt> </dc:description> <dc:creator> <rdf:Seq> <rdf:li>Bruno Lowagie</rdf:li> </rdf:Seq> </dc:creator> <dc:subject> <rdf:Bag> <rdf:li>Metadata</rdf:li> <rdf:li>iText</rdf:li> <rdf:li>PDF</rdf:li> </rdf:Bag> </dc:subject> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">Hello World example</rdf:li> </rdf:Alt> </dc:title> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end="w"?> endstream

Such non-PDF software will look for the sequence W5M0MpCehiHzreSzNTczkc9d , which is a sequence that is unlikely to appear by chance in the data stream.

The begin attribute indicates that the characters in the stream use UTF-8 encoding. They are, because it is good practice for them, but they are not mandatory (ISO-16684-1).

You can get the metadata the way you do ( byte[] metadata = reader.Metadata; ), delete the bytes and change the stream with the PdfStamper instance as follows:

  stamper.XmpMetadata = metadata;

After changing the metadata, you can sign the PDF file.

Please note that one aspect of your question surprises me. You write:

 // metadata[0], metadata[1], metadata[2] contain the BOM

It is very strange that the first three bytes of XMP metadata contain a specification. XMP metadata is assumed to start with <?xpacket . If this is not the case, you are doing the right thing by deleting these bytes.

Caution: PDF may contain XMP metadata at different levels. Now you are studying the most common: document-level metadata. You may encounter PDF files with metadata at the XMP page level, with XMP inside the image, etc.

Remove byte byte mark from signed PDF file?

More articles: