What are the pros or cons of storing json as vs blob text in cassandra?

One problem with blob for me is in java, ByteBuffer (which maps to blob in cassandra) is not Serializable, therefore, does not work with EJB.

Given that json is pretty big, what would be the best type to store json in cassandra. Is it text or blob?

Does json size know when choosing blob vs json?

If any other database, for example, an oracle, then blob / clob is usually used. But in Kassandra, where each cell can take up to 2 GB, does it matter?

Please consider this question as a choice between vs blob text for this case, rather than sorting to suggestions regarding using a single column for json.

+7
java json cassandra blob
source share
4 answers

I don't think there is any use to storing literal JSON data as a BLOB in Cassandra. In the best case, your storage costs are identical, and in general the APIs are less convenient in terms of working with BLOB types, since they are designed to work with strings / text.

For example, if you use their Java API , then to store data as a BLOB using the parameterized PreparedStatement you first need to load all this into a ByteBuffer , for example, by packing JSON data in an InputStream .

If you are not dealing with very large JSON fragments that force you to transfer your data anyway, this is a fair bit of extra work to gain access to the BLOB type. And what would you get from this? In fact, nothing.

However, I think there are some advantages to the question . " Should I store JSON as text or gzip and store the compressed data as a BLOB ?.

And the answer to this question comes down to how you configured Cassandra and your table. In particular, as long as you use Cassandra version 1.1 or later, your tables have default compression. This may be sufficient, especially if your JSON data is pretty uniform on each line.

However, Cassandra built-in compression applies only to tables, not single rows. This way you can get a better compression ratio by manually compressing the JSON data before storage, writing the compressed bytes to ByteBuffer , and then sending the data to Cassandra as a BLOB .

Thus, it essentially boils down to a compromise in terms of storage space and ease of programming against CPU usage. I would decide the following:

  • Minimize the storage that you consider the biggest concern?
    • If so, compress the JSON data and store the compressed bytes as a BLOB ;
    • Otherwise, go to # 2.
  • Is Cassandra built-in compression available and enabled for your table?
    • If not (and if you cannot enable compression), compress the JSON data and store the compressed bytes as a BLOB ;
    • Otherwise, go to # 3.
  • Are the data that you will be storing relatively evenly on each row?
    • The answer is probably yes for the JSON data, in which case you should store the data as text and let Cassandra handle the compression;
    • Otherwise, go to # 4.
  • Do you need efficiency or convenience?
    • Efficiency; Compress JSON data and save compressed bytes as BLOB .
    • Convenience; compress JSON data, base64 - compressed data, and then save the data in base64 encoding as text.
+12
source share

Since the data is not binary, there is really no reason to use the Binary Large OBject. Of course you can do it, but why? The text is easier to read for people, and in fact the difference is the difference in speed and size (.

Even in other databases, you can often store JSON as text. For example. even MySQL has text fields that can handle quite a bit of text (LONGTEXT = 4Gb). Yes, Oracle is lagging, but hopefully they will also get a reasonable long text box.

But why do you want to save the entire Json object as text? Json really needs to be normalized and stored as multiple fields in the database.

0
source share

I would definitely say that text will be better than blob for storing JSON. JSON is ultimately text, so this type makes sense, but there may also be additional overhead for blobs, as some of the drivers seem to require them to be converted to Hex before they are inserted. In addition, blobs are displayed as base64 encoded strings when using cqlsh, so you won’t be able to easily verify that the JSON was actually saved if you needed to test. I'm not sure exactly how the droplets are stored on disk, but I would suggest that this is very similar to text.

With that said, saving large records can cause problems and is not recommended . This can cause shatter problems and consume a lot of memory. Although the FAQ applies to files larger than 64 MB, from the experience of even a few megabytes on average, an average can cause performance problems when you start storing a large number of them. If possible, it would be better to use an object store if you expect JSON to be in megabytes in size and instead store links to this store in Kassandra.

0
source share

In the upcoming 2.2 release, there is also built-in support in Cassandra for JSON. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support

-one
source share

All Articles