How to implement schema changes in a NOSQL storage system

Question

How to implement schema changes in a NOSQL storage system

How do you control the main schema change when using a Nosql store such as SimpleDB?

I know that I still think in terms of SQL, but after working with SimpleDB for several weeks, I need to make changes to the current database. I would like to modify one of the object classes to have a unique identifier, not a company name, and since another object refers to it, I will also need to update the reference value in these objects.

With an SQL database, you will run many sql statements as part of the client software deployment process. Obviously this will not work with something like SimpleDB like

There is no equivalent to the SQL update statement.
Due to the distributed nature of SimpleDB, it is impossible to know when the changes you have made to the database are “filtered out” in all the nodes on which your client software is running.

Some solutions I was thinking about

Each domain has a version number. Client software knows which version of the domain it should use. Write a code that copies data from one version of a domain to another, making the necessary changes during your work. You can then install new client software, which then gains access to the new version of the domain. This approach will not work if you cannot freeze all write permissions during the upgrade process.
Each item has a version attribute that indicates the format used when saving it. The client uses this attribute when loading an object into memory. Then the object can be converted to the last format when it is written back to SimpleDB. The problem is that the new software must be deployed to all servers before recording in the new format, or clients starting the old software will not know how to read the new format.

All this is quite complicated, and I wonder if I am missing something?

thanks

Richard

+8

nosql amazon-simpledb

richard Aug 30 '11 at 5:42

source share

2 answers

RavenDB another NoSQL database uses migrations to achieve this

http://ayende.com/blog/66563/ravendb-migrations-rolling-updates

http://ayende.com/blog/66562/ravendb-migrations-when-to-execute

Typically, these types of changes are handled by your application, which changes the scheme to a newer one when loading version X and converts to version Y and saves

+1

Justin king Aug 30 '11 at 5:46

source share

Tom clarkson · Accepted Answer · 2011-08-31T00:39:20+0000

I use something similar to your second option, but without a version attribute.

First, try to keep your changes in things that are easy to make backward compatible - changing the primary key is the worst case scenario.

Deleting a field is very simple - just stop writing to this field as soon as a version that does not require this is launched on all servers.

Adding a field requires that you never write this object with code that will not save this field. If you cannot deploy the new version everywhere immediately, use an intermediate version that supports saving the field before you deploy the version that requires it.

Changing a field is simply a combination of these two operations.

With this approach, changes are applied as necessary - they are written using the new version, but they allow you to read the old version with default values or derivatives for the new field.

You can use the same code to update all records at the same time, although this may not be acceptable for a large dataset.

Changing the primary key can be handled the same way, but it can become really complicated depending on which nosql system you are using. You are probably stuck in developing custom migration code.

How to implement schema changes in a NOSQL storage system

More articles: