I work in Python 2.7.10 and I have some binary data:
binary_data = b'\x01\x03\x00\x00 \xe6\x10\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
(If you're really interested, Extended WKB geometry.)
Actually, I have this data somewhere inside a dict :
my_data = { 'something1': 5.5, 'something2': u'Some info', 'something3': b'\x01\x03\x00\x00 \xe6\x10\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', }
I want to serialize this for JSON to save it. The problem is that I get an error because json trying to interpret it incorrectly as UTF-8:
>>> json.dumps(my_data) Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\Python\27\Lib\json\__init__.py", line 243, in dumps return _default_encoder.encode(obj) File "C:\Python\27\Lib\json\encoder.py", line 207, in encode chunks = self.iterencode(o, _one_shot=True) File "C:\Python\27\Lib\json\encoder.py", line 270, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 5: invalid continuation byte
I could encode it manually:
my_serializable_data = dict(my_data.items()) my_serializable_data['something3'] = binascii.b2a_base64(my_serializable_data['something3']) json.dumps(my_serializable_data)
gives a pleasant
'{"something2": "Some info", "something3": "AQMAACDmEAAAAQAAAAUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADwPwAAAAAAAPA/AAAAAAAA8D8AAAAAAADwPwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==\\n", "something1": 5.5}'
But that would be cumbersome, as I would need to repeat this throughout the application. I would rather configure json behavior for this binary. You usually tell json how to serialize something by overriding JSONEncoder.default as follows:
class MyJsonEncoder(json.JSONEncoder): def default(self, o): if isinstance(o, str): return binascii.b2a_base64(o) return super(MyJsonEncoder, self).default(o)
But this does not work, apparently because str handling is hardcoded in JSONEncoder :
>>> json.dumps(my_data, cls=MyJsonEncoder) Traceback (most recent call last): File "<input>", line 1, in <module> File "C:\Python\27\Lib\json\__init__.py", line 250, in dumps sort_keys=sort_keys, **kw).encode(obj) File "C:\Python\27\Lib\json\encoder.py", line 207, in encode chunks = self.iterencode(o, _one_shot=True) File "C:\Python\27\Lib\json\encoder.py", line 270, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe6 in position 5: invalid continuation byte
Overriding JSONEncoder.encode should work, but I will need to restore significant logic from the built-in library, since this method knows how to dig out arbitrary levels and combinations of list and dict s. I would rather not do this; he will be terribly fast and error prone. (Also, looking at the source code, it looks like this logic could be in the global methods of the module in json , which makes this idea even more messy.)
It is important to note that deserializing it for subsequent consumption is not a problem for this situation. This is for journal purposes; when this data is deserialized, it will be displayed by the developer. If they really need to do something with the data, they can simply decode it manually. I also want to make a compromise so that if some text is presented as str rather than unicode , it will still get base64 encoding. (As an alternative, I could only reconsider my code in base64, encode it if it contains any non-ASCII characters for printing, but I can't even make this decision until I can solve the problem I'm asking here.)
So, how can I cancel this behavior without trying to rebuild too much JSONEncoding ?