First of all: after signing the PDF code, you should not change any bytes of this PDF document, because you invalidate it.
Second observation: the byte byte character is not part of the PDF header (PDF always starts with %PDF-1. ). In this context, this is the value of the begin attribute in the XMP metadata processing instruction. I do not know any Java client that has a problem with this sequence of bytes anywhere in the file. If they have a problem with this, there is a problem with this client, not the file.
A byte indicates the presence of UTF-8 characters. In the context of XMP, we have a stream inside a PDF that contains an XML text file with clear text that can be used by software that is not “known in PDF format”. For example:
2 0 obj <</Type/Metadata/Subtype/XML/Length 3492>>stream <?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.1.0-jc003"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" dc:format="application/pdf" pdf:Keywords="Metadata, iText, PDF" pdf:Producer="iText® 5.5.4-SNAPSHOT ©2000-2014 iText Group NV (AGPL-version); modified using iText® 5.5.4-SNAPSHOT ©2000-2014 iText Group NV (AGPL-version)" xmp:CreateDate="2014-11-07T16:36:55+01:00" xmp:CreatorTool="My program using iText" xmp:ModifyDate="2014-11-07T16:36:56+01:00" xmp:MetadataDate="2014-11-07T16:36:56+01:00"> <dc:description> <rdf:Alt> <rdf:li xml:lang="x-default">This example shows how to add metadata</rdf:li> </rdf:Alt> </dc:description> <dc:creator> <rdf:Seq> <rdf:li>Bruno Lowagie</rdf:li> </rdf:Seq> </dc:creator> <dc:subject> <rdf:Bag> <rdf:li>Metadata</rdf:li> <rdf:li>iText</rdf:li> <rdf:li>PDF</rdf:li> </rdf:Bag> </dc:subject> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">Hello World example</rdf:li> </rdf:Alt> </dc:title> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end="w"?> endstream
Such non-PDF software will look for the sequence W5M0MpCehiHzreSzNTczkc9d , which is a sequence that is unlikely to appear by chance in the data stream.
The begin attribute indicates that the characters in the stream use UTF-8 encoding. They are, because it is good practice for them, but they are not mandatory (ISO-16684-1).
You can get the metadata the way you do ( byte[] metadata = reader.Metadata; ), delete the bytes and change the stream with the PdfStamper instance as follows:
stamper.XmpMetadata = metadata;
After changing the metadata, you can sign the PDF file.
Please note that one aspect of your question surprises me. You write:
It is very strange that the first three bytes of XMP metadata contain a specification. XMP metadata is assumed to start with <?xpacket . If this is not the case, you are doing the right thing by deleting these bytes.
Caution: PDF may contain XMP metadata at different levels. Now you are studying the most common: document-level metadata. You may encounter PDF files with metadata at the XMP page level, with XMP inside the image, etc.