If you archive a file called a.txt containing the text "hello" (which is 5 characters), then zip will be about 115 bytes. Does this mean that the zip format is inefficient for compressing text files? Of course not. There is overhead. If the file contains "hello" hundred times (500 bytes), then with zipping it will have a file of 120 bytes ! 1x"hello" => 115 bytes, 100x"hello" => 120 bytes! We added 495 bytes, but the compressed size increased by only 5 bytes.
Something similar happens with the encoding/gob :
The implementation compiles its own codec for each data type in the stream and is most effective when one encoder is used to transmit a stream of values, amortizing the cost of compilation.
When you serialize the type value first, the type definition must also be included / passed, so the decoder can correctly interpret and decode the stream:
The flow of throats is self-describing. Each data item in the stream is preceded by a specification of its type, expressed in terms of a small set of predefined types.
Back to your example:
var buf bytes.Buffer enc := gob.NewEncoder(&buf) e := Entry{"k1", "v1"} enc.Encode(e) fmt.Println(buf.Len())
He prints:
48
Now let me code for more than one type:
enc.Encode(e) fmt.Println(buf.Len()) enc.Encode(e) fmt.Println(buf.Len())
Now the conclusion:
60 72
Try it on the go playground .
Analysis of the results:
Additional values ββof the same Entry type cost only 12 bytes , and the first is 48 bytes, since type determination is also included (this is ~ 26 bytes), but this is a one-time overhead.
So basically you are passing 2 string s: "k1" and "v1" , which are 4 bytes, and the length of the string should also be included, using 4 bytes (size int on 32-bit architectures) gives you 12 bytes, which is "minimal". (Yes, you could use a smaller type for length, but that would have its limitations. Variable length encoding would be the best choice for small numbers, see encoding/binary .
All in all, encoding/gob does a pretty good job for your needs. Do not be fooled by the initial impressions.
If this 12 bytes for one Entry too much for you, you can always transfer the stream to compress/flate or compress/gzip to further reduce the size (in exchange for slower encoding / decoding and a slightly higher memory requirement for the process )
Demonstration:
Let me test 3 solutions:
- Using bare output (no compression)
- Using
compress/flate to compress encoding/gob - Using
compress/gzip to compress encoding/gob
We will record a thousand records, changing the keys and values ββof each of them, being "k000" , "v000" , "k001" , "v001" , etc. This means that the size of the uncompressed Entry is 4 bytes + 4 bytes + 4 bytes + 4 bytes = 16 bytes (2x 4 bytes, 2x4 bytes long).
The code is as follows:
names := []string{"Naked", "flate", "gzip"} for _, name := range names { buf := &bytes.Buffer{} var out io.Writer switch name { case "Naked": out = buf case "flate": out, _ = flate.NewWriter(buf, flate.DefaultCompression) case "gzip": out = gzip.NewWriter(buf) } enc := gob.NewEncoder(out) e := Entry{} for i := 0; i < 1000; i++ { e.Key = fmt.Sprintf("k%3d", i) e.Val = fmt.Sprintf("v%3d", i) enc.Encode(e) } if c, ok := out.(io.Closer); ok { c.Close() } fmt.Printf("[%5s] Length: %5d, average: %5.2f / Entry\n", name, buf.Len(), float64(buf.Len())/1000) }
Output:
[Naked] Length: 16036, average: 16.04 / Entry [flate] Length: 4123, average: 4.12 / Entry [ gzip] Length: 4141, average: 4.14 / Entry
Try it on the go playground .
As you can see: the bare output is 16.04 bytes/Entry , slightly smaller than the estimated size (due to the one-time tiny overhead flow rate discussed above).
When you use flate or gzip to compress the output, you can reduce the output size to 4.13 bytes/Entry , which is about ~ 26% of the theoretical size, I'm sure that will satisfy you. (Please note that with βrealβ data, the compression ratio is likely to be much higher, since the keys and values ββthat I used in the test are very similar and, therefore, are very well compressed, and the ratio should be about 50% with the real data )