RavenDB - Scalability Planning

Question

RavenDB - Scalability Planning

I recently studied RavenDB and would like to use it.

I was wondering what kind of advice or suggestions people had about building a system in a way that is ready to scale, in particular, to outline data on servers, but it can start on one server and grow only as needed.

Is it possible, or even possible, to create several databases in one instance and implement the outline of them. Then, for scaling, would you just need to distribute these databases across all machines?

My first impression is that this approach will work, but I would be interested to hear the opinions and experiences of others.

Update 1:

I thought more about this topic. I think my problem with the sort it out later approach is that it is difficult for me to distribute the data evenly across the servers in this situation. I will not have a string key that I can use (AE, FM ..), this will be done with numbers.

This leaves two options that I see. Or break it down at the borders, so that 1-50000 is on fragment 1, 50001-100000 is on fragment 2, but then with a site that grows, say, like this one, your original fragments will do a lot less work. Alternatively, a strategy that combines the fragments of the skull and puts the fragment identifier in the key will suffer if you need to move the document to a new fragment, it will change the key and break the URLs that used the key.

So, my new idea, and I put it back for comment, is to create a statement system from day one. What works is like stuffing the ID of the fragments into a key, but you start with a large number, say 1000, that you distribute evenly between them. Then, when it's time to share the load on the shard, you can say, move the 501-1000 buckets to the new server and write the shard logic, that 1-500 goes to shard 1 and 501-1000 goes to shard 2. Then, when the third server comes To the network, you select a different range of buckets and configure.

In my opinion, this gives you the opportunity to break into as many fragments as you created buckets, evenly distributing the load both in quantity and age. No need to change keys.

Thoughts?

+8

scalability sharding ravendb

Chris sainty May 16 '11 at 6:33

source share

2 answers

synhershko · Answer 1 · 2011-05-16T07:07:26+0000

It is possible, but not really necessary. You can start using one instance and then scale it when necessary, later adjusting the outline.

See also:

http://ravendb.net/documentation/docs-sharding

http://ayende.com/blog/4830/ravendb-auto-sharding-bundle-design-early-thoughts

http://ravendb.net/documentation/replication/sharding

Idan shechter · Answer 2 · 2012-11-04T12:00:54+0000

I think a good solution is to use virtual shards. You can start from one server and point the entire virtual shard to one server. Use the module on incremental id to evenly distribute strings across virtual shards. With Amazon RDS, you have the opportunity to turn the slave into the master, so before changing the scalding configuration (specify more virtual fragments to the new server), you must make the slave master and then update your configuration file and then delete all entries of the new master using modulu that does not match the range of fragments that you use for the new instance.

You also need to delete the rows from the original server, but now all new data with identifiers, which are modules based on the new ranges of the virtual fragment, point to the new server. This way you do not need to move data, but use the Amazon RDS promotion feature.

Then you can make a replica from the source server. You create a unique identifier: Shard ID + table type ID + Incremental number. Therefore, when you query a database, you know which fragment to go with and retrieve the data.

I do not know how this can be done with RavenDB, but it can work very well with Amazon RDS, because Amazon already provides you with the function of replicating and promoting the server.

I agree that their solution should be a solution that from the very beginning offers seamless communication skills and does not tell the developer to sort out the problems when this happens. In addition, I found that many NoSQL solutions that evenly distribute data across fragments should work in a low-latency cluster. Therefore, you must take this into account. I tried using Couchbase with two different EC2 computers (not in a dedicated Amazon cluster), and data balancing was very slow. It also increases the overall cost.

I also want to add that what pinterest did to solve scalability problems using 4096 virtual shards.

You also need to study swap issues with many NoSQL databases. With this approach, you can easily print the data, but perhaps not in the most efficient way, because you may need to query multiple databases. Another problem is changing the circuit. Pinterest solved this by posting all the data in JSON Blob in MySQL. When you want to add a new column, you create a new table with a new data key for column + and you can use the index in that column. If you need to request data, for example, by email, you can create another table with email id + and put the index in the email column. Counters are another problem, I mean atomic counters. Therefore, it is better to derive these counters from JSON and put them in a column so that you can increase the value of the counter.

There are great solutions there, but at the end of the day you will find out that they can be very expensive. I preferred to spend time building my own scalding solution and prevent headaches later. If you choose a different path, there are many companies waiting for you to get into trouble and ask for quite a lot of money to solve your problems. Because at the moment you need them, they know that you will pay everything for your project to work again. From my own experience, why I rack my brains to build my own stunning solution using your approach, which will also be much cheaper.

Another option is to use MySQL middleware, such as ScaleBase or DBshards. Thus, you can continue to work with MySQL, but at the time you need to scale, they have worked well. And the costs can be much lower than the alternative.

One more tip: when creating a configuration for fragments, set the write_lock attribute, which accepts false or true. Therefore, when it is false, the data will not be written to this fragment, so when you get a list of fragments for a certain type of table (for example, users), it will be written only for other fragments of the same type. It is also well suited for backup, so you can show a friendly error for visitors when you want to block all the fragments when backing up all the data in order to get snapshots of snapshots of all the fragments. Although I think you can send a global query to snapshot all databases using Amazon RDS and use a timed backup.

The fact is that most companies will not spend time working with a DIY fragmentation solution, they will prefer to pay for ScaleBase. These solutions come from individual developers who can afford to pay for a scalable solution from the start, but want to be sure that when they reach the level they need, they have a solution. Just look at the prices there, and you can understand that it will cost you a lot. I will be happy to share my code with you when you are done. In my opinion, you are going with the best path, it all depends on your application logic. I model my database as simple, not aggregated, not complex aggregation queries - this solves many of my problems. In the future, you can use Map Reduce to solve these large data queries.

RavenDB - Scalability Planning

More articles: