Python protocol buffers - unicode decoding error

I need to get a protocol buffer message on my python-torornado server and pull material from a binary message.

postContent = self.request.body message = prototemp.ReqMessage() message.ParseFromString(postContent) 

It works great with a test tool. When I run it in a sandbox environment and model 1000 requests from my client, it works in certain cases, but in most requests it throws an exception -

  File "server1.py", line 21, in post message.ParseFromString(postContent) File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/message.py", line 179, in ParseFromString self.MergeFromString(serialized) File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 755, in MergeFromString if self._InternalParse(serialized, 0, length) != length: File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse pos = field_decoder(buffer, new_pos, end, self, field_dict) File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 544, in DecodeField if value._InternalParse(buffer, pos, new_pos) != new_pos: File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse pos = field_decoder(buffer, new_pos, end, self, field_dict) File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 410, in DecodeField field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 1: invalid continuation byte 

In some other cases, it gives these errors -

 UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 3: invalid start byte UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 3: unexpected end of data 

What could be the reason?

+4
source share
2 answers

I had exactly the same problem with RabbitMQ and protocol buffers. The problem is that the protocol buffer assumes that the input will be of type str, while RabbitMQ seems to decode the message as unicode in some cases (if the byte array contains bytes greater than 127). The same thing can happen with a tornado. It still seems that the problem can be solved by following the code snippet:

 body = self.request.body if type(body) == unicode: data = bytearray(body, "utf-8") body = bytes(data) message = whatever.FromString(body) 

This code turns a Unicode string into a python byte object that can be successfully parsed by protocol buffer messages. I don't know if there is a better way to do this, but at least it works.

+4
source

I am facing the same problem.

Here is the link [1].

In this case, we must use bytes.

Thanks.

[1] https://developers.google.com/protocol-buffers/docs/proto#scalar

+1
source

All Articles