Does Python Pickle have an illegal character / sequence that I can use as a delimiter?

I want to make (and decode) a single line consisting of several python pickles.

Is there a character or sequence that can be used as a delimiter on this line?

I have to make a line like this:

s = pickle.dumps(o1) + PICKLE_SEPARATOR + pickle.dumps(o2) + PICKLE_SEPARATOR + pickle.dumps(o3) ... 

I should be able to take this line and restore the objects as follows:

 [pickle.loads(s) for s in input.split(PICKLE_SEPARATOR)] 

What should be PICKLE_SEPARATOR?


For the curious, I want to send pickled objects to redis using APPEND. (although maybe I'm just using RPUSH)

+7
python pickle
source share
4 answers

Well, just to combine pickles, Python knows where each ends

 >>> import cStringIO as stringio >>> import cPickle as pickle >>> o1 = {} >>> o2 = [] >>> o3 = () >>> p = pickle.dumps(o1)+pickle.dumps(o2)+pickle.dumps(o3) >>> s = stringio.StringIO(p) >>> pickle.load(s) {} >>> pickle.load(s) [] >>> pickle.load(s) () 
+7
source share

I do not use Python a lot, but is there a reason why you could not just saw the array? Thus, the etching becomes

 s = pickle.dumps([o1,o2,o3]) 

and reconstruction becomes

 objs = pickle.loads(s) 

Edit 1: Also, according to this answer , the pickled yield is self-limited; that way you can pickle with

 s = ''.join(map(pickle.dumps,[o1,o2,o3])) 

and restore using

 import StringIO sio = StringIO.StringIO(s) objs = [] try: while True: objs.append(pickle.load(sio)) catch EOFError: pass 

I'm not sure there is an advantage to this. (Although I haven't seen it, maybe it's better than this nasty loop / exception combination, as I said, I don't use Python much.)

+2
source share

EDIT: First, consider the gnibbler answer, which is obviously much simpler. The only reason to prefer the one below is that you want to split the sequence of pickles without parsing them.

A reasonably safe bet is to use a new UUID that you will never use anywhere else. Evaluate uuid.uuid4().bytes once and save the result in your code as a delimiter. For example:.

 >>> import uuid >>> uuid.uuid4().bytes '\xae\x9fW\xff\x19cG\x0c\xb1\xe1\x1aV%P\xb7\xa8' 

Then copy the resulting resulting string literal into your code as a delimiter (or even just use the one above if you want). It is pretty much guaranteed that the same sequence will never occur in everything that you ever want to keep.

+2
source share

One solution would be to add your pickle string with data on how many characters are contained in each element.

-one
source share

All Articles