Is there a Javascript notation format more compact than JSON / BSON?

Is there a good serialization / deserialization format for simple Javascript object trees that are significantly smaller than JSON? BSON is not very impressive.

JSON overhead is especially important for trees, where many objects share the same set of properties. Theoretically, it should be possible to detect schemas in object arrays so that property names are not duplicated.

+4
source share
3 answers

You can turn JSON into a more "database format" and then translate it back to regular objects. The result may be worth the time.

// Typed on the fly var dict = [ ["FirstName", "LastName"], ["Ken", "Starr"], ["Kermit", "Frog"] ]; 

Then you can encode the dictionary, for example:

 // Again, typed on the fly var headers = dict[0]; var result = [] var o; for (var i = 0 + 1; i < dict.length; i++) { o = {} for (j = 0; j < headers.length; j++) { o[headers[j]] = dict[i][j]; } result.push(o); } 
+1
source

Gzip is fast. Very fast. And I have great confidence that this is the best (in terms of practicality and efficiency) solution for transporting dry objects.

To illustrate this, I built a quick sample on one of my intermediate sites.

http://staging.svidgen.com/ajax/test-a.js generates 5k lines of simple data and displays the original, unoccupied JSON.

 $data = array(); for ($i = 0; $i < 5000; $i++) { $data[] = array( 'value-a' => $i, 'value-b' => pow($i,2), 'value-c' => pow($i,3) ); } print json_encode($data); 

The response is gzipped 65 KB and requires 357 ms to query, build, serialize and transmit. Omitting the analysis of client size from the equation, the throughput is 182 KB / s . Given 274 KB of raw data, the effective throughput is 767 KB / s . The answer is as follows:

 [{"value-a":0,"value-b":0,"value-c":0},{"value-a":1,"value-b":1,"value-c":1} /* etc. */] 

The alternative http://staging.svidgen.com/ajax/test-b.js format generates the same 5k lines of simple data, but restructures the data into a more efficient indexed JSON serialization.

 $data = array(); for ($i = 0; $i < 5000; $i++) { $data[] = array( 'value-a' => $i, 'value-b' => pow($i,2), 'value-c' => pow($i,3) ); } $out_index = array(); $out = array(); foreach ($data as $row) { $new_row = array(); foreach ($row as $k => $v) { if (!isset($out_index[$k])) { $out_index[$k] = sizeof($out_index); } $new_row[$out_index[$k]] = $v; } $out[] = $new_row; } print json_encode(array( 'index' => $out_index, 'rows' => $out )); 

The response is gzipped 59.4KB and requires 515 ms to query, build, serialize, and transmit. Omitting the analysis of client size from the equation, the throughput is 115 KB / s . Given 128 KB of raw data transfer, the effective throughput is 248 KB / s . The answer is as follows:

 {"index":{"value-a":0,"value-b":1,"value-c":2},"rows":[[0,0,0],[1,1,1] /* etc. */ ]} 

So, in our pretty simple example, raw, restructured data is 50% less than the original source data. But, only 9% less when gzipped. And the cost, in this case, is 44% of the increase in the total request time.

If you wrote a binary library for data restructuring, I expect that you can significantly reduce this by 44%. But this is still very unlikely. To do this, you will need to serialize the data without waiting for more than 9% longer than the coding of the "normal" structure to see any gain.

The only way to avoid restructuring or “alternative serialization” is to work with all of your server-side objects in an uncomfortable, indexed way from start to finish. And, if you don’t actually click on every small ounce of performance outside your equipment, this is really just a scary idea.

And in both cases, gzip space savings are much higher than what we can accomplish using an alternative JavaScript compilation format.

(And we don’t even take into account parsing by the size of the client - this will be very significant for everything that is not “normal” JSON or XML.)

In custody

Just use the built-in serialization and gzip libraries.

+1
source
  • There are very fast compression libraries. Thus, it is not that a compact format is better than a compact format plus compression.

    I don't have a link, but I think a former protobuf developer recommended this approach.

  • Give up MessagePack . This is a fairly compact binary format with semantics, very similar to JSON or BSON.

    Short arrays (up to 16 elements), cards (up to 16 pairs) and strings (up to 32 bytes) have single-byte service data. Small integers (from -32 to 128) accept only one byte number.

0
source

All Articles