Write Arabic Symbols Using PDFBOX

  • Update 1

I am trying to write Arabic characters in a pdf document using pdfbox. As a result, I get some weird characters. Below you can find the code snippet that I used for my test. Please note that the same code was used to print latin characters without any problems.

public static void main(String[] args) throws Exception { PDDocument document = new PDDocument(); PDPage page = new PDPage(PDPage.PAGE_SIZE_A4); document.addPage(page); PDPageContentStream stream = new PDPageContentStream(document, page,true, true); //Use of a unicode font PDFont font = PDTrueTypeFont.loadTTF(document,"C:/arialuni.ttf"); font.setFontEncoding(new WinAnsiEncoding()); stream.setFont(font, 12); stream.beginText(); stream.moveTextPositionByAmount(40, 600); stream.drawString("سي ججس ححسيب حسججسيبنم حح "); stream.endText(); stream.close(); document.save("c:\\resultpdf.pdf"); document.close(); } 

Thank you for your help. I tried the Unicode font downloaded from the Microsoft website, but I still have the same result.

  1. Update 2

Using the 'drawUnicodeString' and mehod 'loadTTF' methods, I got the PDFBOX-922 form . I was able to write Arabic characters, but they were disabled and ordered from left to right. Here are two methods: drawUnicodeString and loadTTF

 public void drawUnicodeString(String text) throws IOException { COSString string = new COSString(); for (int i = 0; i < text.length(); i++) { char c = text.charAt(i); string.append(c >> 8); string.append(c & 0xff); } ByteArrayOutputStream buffer = new ByteArrayOutputStream(); string.writePDF(buffer); appendRawCommands(buffer.toByteArray()); appendRawCommands(32); appendRawCommands(getISOBytes("Tj\n")); } public static PDType0Font loadTTF(PDDocument doc, InputStream is) throws IOException { /* Load the font which we will convert to Type0 font. */ PDTrueTypeFont pdTtf = PDTrueTypeFont.loadTTF(doc, is); TrueTypeFont ttf = pdTtf.getTTFFont(); CMAPEncodingEntry unicodeMap = null; for (CMAPEncodingEntry candidate : ttf.getCMAP().getCmaps()) { if (candidate.getPlatformId() == CMAPTable.PLATFORM_WINDOWS && candidate.getPlatformEncodingId() == CMAPTable.ENCODING_UNICODE) { unicodeMap = candidate; break; } } if (unicodeMap == null) { throw new RuntimeException( "To use as CIDFont, the TTF must have a Windows platform Unicode encoding"); } float scaling = 1000f / ttf.getHeader().getUnitsPerEm(); MyPDCIDFontType2Font pdCidFont2 = new MyPDCIDFontType2Font(); pdCidFont2.setBaseFont(pdTtf.getBaseFont()); pdCidFont2.setFontDescriptor((PDFontDescriptorDictionary) pdTtf .getFontDescriptor()); /* Fixme -- should determine the minimum and maximum charcode in the map */ int[] cid2gid = new int[65536]; List<Float> widths = new ArrayList<Float>(); int[] widthValues = ttf.getHorizontalMetrics().getAdvanceWidth(); for (int i = 0; i < cid2gid.length; i++) { int glyph = unicodeMap.getGlyphId(i); cid2gid[i] = glyph; widths.add((float) i); widths.add((float) i); widths.add(widthValues[glyph] * scaling); } pdCidFont2.setCidToGid(cid2gid); pdCidFont2.setWidths(widths); pdCidFont2.setDefaultWidth(widths.get(0).longValue()); /* Now construct the type0 font that we actually return */ myType0Font pdFont0 = new myType0Font(); pdFont0.setDescendantFont(pdCidFont2); pdFont0.setDescendantFonts(new COSObject(pdCidFont2.getCOSObject())); pdFont0.setEncoding(COSName.IDENTITY_H); pdFont0.setBaseFont(pdTtf.getBaseFont()); // pdfont0.setToUnicode(COSName.IDENTITY_H); XXX how to express identity // mapping as ToUnicode program? */ return pdFont0; } 

and here are the printed characters:

disconnected arabic letters

I do not know why these characters are disabled

+5
source share
2 answers

Arabic can be written using PDFBOX-922 and PDFBOX-1287 . (files with restrictions are included in the description of the problems) I hope that the fixes will be applied in version 2.0.

+5
source

I suggest you try adding ICU4J to your project: ICU4J

+3
source

All Articles