I am writing an application that should quickly deserialize millions of messages from a single file.
What the application does is essentially get one message from the file, do some work, and then throw the message away. Each message consists of ~ 100 fields (not all of them are always parsed, but I need them all, because the application user can decide which fields he wants to work in).
At this point, the application consists of a loop that at each iteration is executed only by calling readDelimitedFrom() .
Is there a way to optimize the problem to better fit this case (splitting into multiple files, etc.). In addition, at this moment, due to the number of messages and the dimension of each message, I need a gzip file (and it is quite effective in reducing the size, since the value of the fields is quite repeated) - this, although it reduces performance.
source share