Are Cassandra user-defined data types recommended in terms of performance?

I have a Cassandra client table that will contain a list of clients. Each client has an address, which is a list of standard fields:

{ CustomerName: "", etc..., Address: { street: "", city: "", province: "", etc... } } 

My question is that I have a million clients in this table, and I use a user-defined address of a data type to store address information for each client in the Customers table, what are the consequences of this model, especially in terms of disk space. Will it be very expensive? Should I use the data type defined by the user in the address, or pass the address data, or even use a separate table?

+5
source share
1 answer

Basically, what happens in this case is that Cassandra will serialize the address instances in the blob, which is stored as a single column as part of your client table. I don't have any quantities on how much serialization will be installed on top of the disk or CPU usage, but it probably won't make much difference for your use case. You should check both cases.

Edit: Another aspect I should have mentioned: treating UDTs as single blobs implies replacing the full UDT for any updates. This will be less effective than updating individual columns and is a potential cause of inconsistencies. In the case of simultaneous updates, both records can overwrite each other's changes. See CASSANDRA-7423 .

+3
source

All Articles