Mac OS Text Decoding in Python

Question

Mac OS Text Decoding in Python

I am writing code to parse RTF documents and must process the various code pages that they can use. Python comes with decoders for all the necessary Windows code pages, but I'm not sure how to handle a Mac:

# 77: "10000", # Mac Roman # 78: "10001", # Mac Shift Jis # 79: "10003", # Mac Hangul # 80: "10008", # Mac GB2312 # 81: "10002", # Mac Big5 # 83: "10005", # Mac Hebrew # 84: "10004", # Mac Arabic # 85: "10006", # Mac Greek # 86: "10081", # Mac Turkish # 87: "10021", # Mac Thai # 88: "10029", # Mac East Europe # 89: "10007", # Mac Russian

Does Python have built-in support? If not, is there a cross-platform library with pure Python that will handle them?

+6

python macos

Brendon Oct 20 '09 at 7:05

source share

3 answers

It looks like in Python stdlib, by the names macroman and macturkish, there are at least Mac Roman and Mac Turkish encodings. See http://svn.python.org/projects/python/trunk/Lib/encodings/aliases.py for a complete list of modern coding aliases.

+3

Tuure laurinolli Oct 20 '09 at 7:10

source share

Not.

However, unicode.org does contain codec description files that you can use to create modules that will analyze these codecs. The python distributions include a script that will convert these files: Python-xx/Tools/unicode/gencodec.py .

+1

habnabit Oct 20 '09 at 7:10

source share

Jerub · Accepted Answer · 2009-10-20T07:09:54+0000

You can use python codecs for them, which are known by the names "mac-roman", "mac-turkish", etc.

 >>> 'foo'.decode('mac-turkish') u'foo'

You will have to refer to them by their names, these numbers that you have in your question do not appear in the source files. See $pylib/encodings/mac_*.py for more information.

Mac OS Text Decoding in Python

More articles: