Does Python save serialization that handles str / unicode correctly?

Besides PyYAML, are there any safe Python data serialization libraries that correctly handle unicode / str ?

For instance:

 >>> json.loads(json.dumps([u"x", "x"])) [u'x', u'x'] # Both unicode >>> msgpack.loads(msgpack.dumps([u"x", "x"])) ['x', 'x'] # Neither are unicode >>> bson.loads(bson.dumps({"x": [u"x", "x"]})) {u'x': [u'x', 'x']} # Dict keys become unicode >>> pyamf.decode(pyamf.encode([u"x", "x"])).next() [u'x', u'x'] # Both are unicode 

Note that I want the serializers to be safe (therefore pickle and marshel missing), and PyYAML is an option, but I don't like the complexity of YAML, so I would like to know if there are other options.

Change There seems to be some confusion about the nature of my data. Some of them are Unicode (for example, names), and some of them are binary (for example, images) ... Therefore, a serialization library that confuses unicode and str is useless to me as a library that confuses "42" and 42 .

+4
source share
3 answers

Have you tried bert ?

 >>> import bert >>> bert.decode(bert.encode([u"x", "x"])) [u'x', 'x'] >>> bert.decode(bert.encode({"x":[u"x", "x"]})) {'x': [u'x', 'x']} 

(for installation, you will first have to manually install erlastic, due to this outstanding pull request )

+1
source

Maybe just use Python repr to save the value and deserialize it using the ast .literal_eval method:

 In [7]: ast.literal_eval (repr({"d": ["x", u"x"]})) Out[7]: {'d': ['x', u'x']} 
+3
source

Looking for the same thing that I found that msgpack-python 0.4 now supports str / unicode with use_bin_type / encoding arguments:

 >>> msgpack.unpackb(msgpack.packb(["uu\x00u", u"adsa\xe4"], use_bin_type=True, encoding="utf-8"), encoding="utf-8") ['uu\x00u', u'adsa\xe4'] 
+1
source

All Articles