Mac OS Text Decoding in Python

I am writing code to parse RTF documents and must process the various code pages that they can use. Python comes with decoders for all the necessary Windows code pages, but I'm not sure how to handle a Mac:

# 77: "10000", # Mac Roman # 78: "10001", # Mac Shift Jis # 79: "10003", # Mac Hangul # 80: "10008", # Mac GB2312 # 81: "10002", # Mac Big5 # 83: "10005", # Mac Hebrew # 84: "10004", # Mac Arabic # 85: "10006", # Mac Greek # 86: "10081", # Mac Turkish # 87: "10021", # Mac Thai # 88: "10029", # Mac East Europe # 89: "10007", # Mac Russian 

Does Python have built-in support? If not, is there a cross-platform library with pure Python that will handle them?

+6
python macos
source share
3 answers

You can use python codecs for them, which are known by the names "mac-roman", "mac-turkish", etc.

 >>> 'foo'.decode('mac-turkish') u'foo' 

You will have to refer to them by their names, these numbers that you have in your question do not appear in the source files. See $pylib/encodings/mac_*.py for more information.

+8
source share

It looks like in Python stdlib, by the names macroman and macturkish, there are at least Mac Roman and Mac Turkish encodings. See http://svn.python.org/projects/python/trunk/Lib/encodings/aliases.py for a complete list of modern coding aliases.

+3
source share

Not.

However, unicode.org does contain codec description files that you can use to create modules that will analyze these codecs. The python distributions include a script that will convert these files: Python-xx/Tools/unicode/gencodec.py .

+1
source share

All Articles