Google Protocol Buffering Serialization freezes when writing 1 GB + data

I serialize a large dataset using protocol buffer serialization. When my dataset contains 400,000 user objects with a combined size of about 1 GB, serialization returns after 3-4 seconds. But when my data set contains 450,000 objects of a combined size of about 1.2 GB, the serialization call never returns and the processor is constantly consumed.

I am using the .NET protocol protocol port.

+7
source share
2 answers

Looking at new comments, this seems to be (as the OP notes) a MemoryStream limited. A slight annoyance in the protobuf specification is that since the lengths of the sub-messages are variable and must prefix the sub-message, it is often necessary to fill in the parts until the length is known. This is good for most reasonable graphs, but if there is an exceptionally large graph (with the exception of the “root object has millions of direct children” scenario that does not suffer), it can end up doing a bit of memory.

If you are not tied to a specific layout (perhaps due to interaction with .proto with an existing client), then a simple fix looks like this: according to the properties of the child (sub-object) (including lists / objects), tell him to use the "group" serialization " This is not the default layout, but it says: "instead of using the length prefix, use a start / end pair of tokens. The disadvantage of this is that if your deserialization code does not know about a specific object, it takes more time to skip this field, because it cannot just say “look ahead 231413 bytes” - markers must pass instead to find out when the object is finished . In most cases, this is not a problem at all, since your deserialization code is fully expecting data.

For this:

 [ProtoMember(1, DataFormat = DataFormat.Group)] public SomeType SomeChild { get; set; } .... [ProtoMember(4, DataFormat = DataFormat.Group)] public List<SomeOtherType> SomeChildren { get { return someChildren; } } 

Deserialization in protobuf-net is very forgiving (by default there is an optional strict mode), and it will happily deserialize groups instead of the length prefix and length prefix instead of groups (which means: any data that you already stored somewhere should work fine).

+6
source

1.2G of memory is dangerously close to the limit of managed memory for 32-bit .Net processes. I assume that the serialization triggers of OutOfMemoryException and all hell are broken.

You should try using a few small serializations rather than a giant one, or move on to a 64-bit process.

Cheers, Florian

+1
source

All Articles