Serialized size protobuf-net property

We use protobuf-net to serialize and deserialize messages in an application whose open protocol is based on Google protocol buffers. The library is excellent and covers all of our requirements, except for this: we need to know the length of the serialized message in bytes before the message is actually serialized.

The question was already asked a year and a half ago, and according to Mark, the only way to do this is to serialize into a MemoryStream and read the .Length property afterwards. This is unacceptable in our case, because a MemoryStream allocates a byte buffer behind the scenes, and we should avoid this.

This line from the same answer gives us hope that this is possible in the end:

If you clarify what a precedent is, I am sure that we can make it easily accessible (if it is not already indicated).

Here is our precedent. We have messages ranging in size from a few bytes to two megabytes. The application preallocates the byte buffers used for socket operations, as well as for serialization / deserialization, and after the completion of the warm-up phase, additional buffers cannot be created (hint: avoding GC and heap fragmentation). Byte buffers are essentially combined. We also want to avoid the maximum possible copying of bytes between buffers / streams.

We have developed two possible strategies, and both of them require a preliminary message size:

  • Use (large) buffers with a fixed byte size and serialize all messages that can fit into a single buffer; send the contents of the buffer using Socket.Send . We need to know when the next message cannot fit into the buffer and stop serializing. Without message size, the only way to achieve this is to wait for an exception to occur during Serialize .
  • Use (small) variable-size byte buffers and serialize each message into one buffer; send the contents of the buffer using Socket.Send . To check the byte buffer with the appropriate size from the pool, we need to know how many bytes the serialized message has.

Since the protocol is already defined (we cannot change this), and Varint32 is required to prefix the message length, we cannot use the SerializeWithLengthPrefix method.

. Is it possible to add a method that estimates the size of a message without serialization in a stream? If this is something that does not correspond to the current set of functions and the library roadmap, but is doable, we are interested in expanding the library. We are also looking for alternative approaches, if any.

+8
c # protobuf-net
source share
1 answer

As already noted, this is not immediately available, as the code intentionally tries to make a single pass through the data (especially IEnumerable<T> , etc.). However, depending on your data, a moderate number of copies may already be running, so you can assume that the sub-messages also have a length prefix, so juggling may be required. This juggling can be significantly reduced by using the intragroup subformat inside the message, since groups only allow you to create back and forth without backlinks.

So, is it possible to add a method that estimates the size of a message without serialization in a stream?

Evaluation is approaching worthless; since there is no terminator, this must be accurate. Ultimately, the dimensions are a little difficult to predict without even having done so. In version v1, there was some code for predicting the size, but currently a one-pass code is preferable, and in most cases the buffer overhead is nominal (there is a code for reusing internal buffers so that it does not spend all the time allocating buffers for small messages) .

If your message is internally redirected (grouped), then the cheat can be serialized for a fake stream that measures but leaves all the data; however, you will finish serializing twice.

Re:

and requires the message length prefix to be Varint32, we cannot use the SerializeWithLengthPrefix method

I'm not quite sure that I see a relationship there - it allows you to use a number of formats here, etc .; perhaps if you can be more specific?

Re-copying the data - the idea I was playing with - is to use sub-standard forms to prefix the length. For example, it may be that in most cases 5 bytes are many, so instead of juggling it can leave 5 bytes and then simply overwrite without condensation (since the octet 10000000 still means β€œzero” and continue β€œeven if it is redundant). This you still need to buffer (to allow backfilling), but did not require or move data.

The last simple idea would be simple: serialize to FileStream ; then write the file length and file data. Obviously, it is trading memory for I / O.

+4
source share

All Articles