You cannot just replace text in lines. I do not say it frivolously. I worked on Acrobat many years ago and used the text search tool in the initial version, so I have a pretty deep understanding of text encoding issues. The main problem is that each line in PDF format is somehow encoded. This is because PDF was made before Unicode was generally available and had a history in PostScript. PosctScript liked to have very flexible encoding methods for fonts and to encourage re-encoding.
So, let's take a step back and understand the whole picture.
A character in a string in PDF format, which is intended to be displayed by a text operator, is by default encoded as a sequence of 8-bit characters. To determine which glyph is drawn for each byte, the byte is pushed through the encoding vector for that font. The encoding vector maps the byte to the glyph name, which is then viewed in font and drawn on the page. Keep in mind that this description is half-truth (later).
Most of the applications that generate PDFs are kind and just use standard encoding like StandardEncoding or WinAnsiEncoding , most of which are pretty reasonable. Others will use standard encodings along with delta strong> encoding, which are differences from standard encoding from what is encoded.
Some applications try to be more economical in the PDF they create, so they look at the glyphs they use and decide whether to embed a subset of the font. If they use only letters and numbers in upper and lower case, they rearrange the font without these elements and can also re-index them and provide an encoding vector, so byte 0x00 goes to the glyph 'a', and 0x01 goes to the glyph 'b' and etc.
Now back to the half truth. There is a class of fonts that are encoded with a character identifier (or CID), and TrueType and OpenType fonts fall into this category. In this case, you get access to Unicode, but again there is an encoding step where you, the string that is now UTF16BE, maps to the CID, which is used to get the font glyph. And without much reason, Adobe uses the PostScript function to display. Again, this is true about 3 / 4s, because there is a different encoding for encoding Chinese, Japanese, and Korean fonts.
So, before you blithely place a character in a string for a PDF font, you should ask a few questions:
- Is my glyph a font?
- Is my glyph encoded?
- What is the encoding of my glyph?
And any of them may differ from what you expect. So, for example, if you want to put in Γ (diresis), you should see if the font has a glyph (which may not be there, because the font is a subset). Then the font may have a funny encoding, which may not contain a glyph. Finally, the actual value of the bytes to use for Γ may not be standard.
So when I see someone trying to just replace a piece of PDF text, all I see is a world of pain. For most normal PDFs this will work, say, in 90% of cases, but for something exotic, good luck. The differences in rendering PDF text are so painful that it is sometimes easier to think of it as a write-only format.