I would expect Protobuf-net to be faster even for small objects ... but you can try my protocol buffer port as well. I have not used the Marc port for a while - mine was faster when I last tested it, but I know that it has been completely rewritten since :)
I doubt that you will achieve serialization of a billion elements in 100 ms, no matter what you do, though ... I think this is just an unreasonable expectation, especially if it is being written to disk. (Obviously, if you just overwrite the same bit of memory, you will get much better performance than serializing to disk, but I doubt that what you are trying to do is really.)
If you can give us more context, we can help more. For example, is it possible to spread the load on several machines? (Several serial cores with the same I / O device are unlikely to help, since I would not expect this to be a CPU-bound operation if it is written to disk or to the network.)
EDIT: Suppose each object has 10 doubles (8 bytes each) with an ulong identifier (4 bytes). This is 84 bytes per object at least. So you are trying to serialize 8.4 GB in 100 ms. I really don't think this is achievable no matter what you use.
Now I run protocol protocol tests (they give bytes per second), but I very much doubt that they will give you what you want.
Jon skeet
source share