Can someone explain scenarios where Project Voldermort repositories are useful or similar key values?

Question

Can someone explain scenarios where Project Voldermort repositories are useful or similar key values?

I see that I am using Project Voldermort to cache the results from a traditional RDBMS query. But in this case, it has practically no advantages over other (Java) caching systems, such as EHcache Jcache, etc.

Where else can I use Project Voldermort or similar key value stores? How do you use this in your business applications?

+4

nosql key-value voldemort

user193116 Oct 20 '09 at 15:42

source share

2 answers

Project Voldermort is part of the NoSQL movement. Trends in computer architectures are pushing databases in a direction requiring horizontal scalability. NOSQL is trying to solve this requirement.

Among the claimed advantages of such Key / Value storages is the ability to break through huge amounts of data without the overhead of a traditional DBMS.

http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_

+2

Robert Harvey Oct 20 '09 at 15:56

source share

ieure · Accepted Answer · 2009-12-30T07:23:31+0000

One approach to speeding up your database is denormalization. Take this MySQL example:

CREATE TABLE `users` ( `user_id` INT NOT NULL AUTO_INCREMENT, … -- Additional user data PRIMARY KEY (`user_id`) ); CREATE TABLE `roles` ( `role_id` INT NOT NULL AUTO_INCREMENT, `name` VARCHAR(64), PRIMARY KEY (`role_id`) ); CREATE TABLE `users_roles` ( `user_id` INT NOT NULL, `role_id` INT NOT NULL, PRIMARY KEY (`user_id`, `role_id`) );

Clean, neat, normalized. But if you want to get users and their roles, the request will be complicated:

 SELECT u.*, r.* FROM `users` u LEFT JOIN `user_roles` ur ON u.`user_id` = ur.`user_id` JOIN `roles` r ON ur.`role_id` = r.`role_id`;

If you deny it, it might look something like this:

 CREATE TABLE `users` ( `user_id` INT NOT NULL AUTO_INCREMENT, `role` VARCHAR(64), … -- Additional user data PRIMARY KEY (`user_id`) );

And the equivalent query:

 SELECT * FROM `users`;

This improves some performance characteristics of your queries:

Since the result you want is already in the table, you do not need to perform read-side calculations. for example, if you want to see the number of users with a given role, you will need GROUP BY and COUNT . If it were denormalized, you saved it in another table dedicated to storing roles and the number of users who have this role.
The data you want is in the same place and hopefully in the same place on the disk. Instead of requiring a lot of random queries, you can do this with multiple consecutive readings.

NoSQL databases are highly optimized for these cases when you want to access the main static sequential dataset. At this point, it simply moves bytes from disk to the network. Less work, less overhead, more speed. Despite how simple it sounds, you can simulate your data and application to make it natural.

The tradeoff for this performance is the write load, disk space, and some application complexity. Denormalizing your data means more copies, which means more disk space and write load. Essentially, you have one dataset for each query. Since you are transferring the load on these calculations to the write time instead of the read time, you really need some kind of asynchronous mechanism for this, hence the complexity of the application.

And since you need to store more copies, you need to do more write operations. That's why you can’t practically copy such an architecture using an SQL database - it is extremely difficult to scale records.

In my experience, the tradeoff is well worth it for large-scale use. If you want to read a little more about the practical application of Cassandra, I wrote this article a few months ago, and you may find it useful.

Can someone explain scenarios where Project Voldermort repositories are useful or similar key values?

More articles: