Another encoding problem, I'm dealing with an IBM mainframe using the IBM870 encoding, which is not supported by python or does not matter at all.
Fortunately, the gifted encoder has cracked a script that generates the appropriate encoding definitions for python using the character lists available in FileFormat.info
List Used: IBM870 Character List
The generated encoding can be seen here: cp870.py
This system is RHEL 6.3, working with python 2.6:
Python 2.6.6 (r266:84292, Aug 28 2012, 10:55:56) [GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
cp870.py is placed in:
/usr/lib64/python2.6/encodings/
The following entries have been added:
/usr/lib64/python2.6/encodings/aliases.py
The alias is correctly parsed as shown here ( thanks to this answer ):
>>> from encodings.aliases import aliases >>> def find(q): ... return [(k,v) for k, v in aliases.items() if q in k or q in v] ... >>> find('870') [('ibm870', 'cp870'), ('870', 'cp870'), ('csibm870', 'cp870')] >>> find('cp870') [('ibm870', 'cp870'), ('870', 'cp870'), ('csibm870', 'cp870')] >>> find('ibm870') [('ibm870', 'cp870'), ('csibm870', 'cp870')]
When I tried to encode () some characters, it did not work as planned:
>>> 'c'.encode('cp870') '\x83' >>> 'č'.encode('cp870') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/encodings/cp870.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
This is what '\ x83' should be according to cp870.py:
u'\x83'
As I start with python, can someone tell me what else is needed for python to load and use this encoding correctly?