MsWord uses the Wingdings and Symbol characters for markers, by default their hexadecimal values โโare F0A7 and F0B7. I want to convert markers to their Unicode equivalents. Of course, this depends on the actual font, so F0A7 Wingding will become Unicode 25AA (โ). I found a partial mapping from Wingdings to Unicode and from Symbol to Unicode . Is there a shared library (preferably in Java) or a database for these mappings?
Since this question has been asked, a large number of dingbat and emoji characters have been added to Unicode, some for the explicit purpose of Wingdings / Webdings compatibility .
So here is my attempt to map Wingdings and Webdings encodings to Unicode. All these characters are present in the Symbola font. In other fonts, many glyphs will be displayed as fields or question marks.
The latter should be a Windows logo, which is not included in Unicode for trademarks. If someone can find a better approximation, let me know.
Please note that the numbers 0-9 and punctuation marks !#%&()+,./:;<=>?[]_{|} same as in ASCII.
!#%&()+,./:;<=>?[]_{|}
ยฉยฎโข characters are encoded twice, with slightly different glyphs: the first set (0xD2-0xD4) uses a serif font for letters, and the second set (0xE2-0xE4) uses a sans-serif font. This difference may or may not matter to your font replacement purposes.
ยฉยฎโข
Such information is embedded in the .ttf file - I'm not sure how to access it in java.
AFAIK java.awt.Font only supports unicode - Apache PdfBox can have classes / methods for your needs (it has a ttf parser, if I remember)