Generating an Identifier for a Private Database (Azure Federated Database)

Question

Generating an Identifier for a Private Database (Azure Federated Database)

I searched for some articles or best practice guidelines for generating id (for federated / primary key) for Azure Federated databases and did not find anything convincing. Federated tables do not support identifier columns, so it seems to me that the only practical type of identifier is a GUID, since trying to centrally create and use BigInt creates a single point of failure in the application. My main problem is the consequences of using a GUID for BigInts (especially for indexing tables).

Is there any recommended / best practice (or existing libraries) for creating unique BigInts for a distributed system (or should I not worry about the consequences of using a GUID?).

[Update]

After reading a lot more about this, starting with posting the question, it seems to me that the key generation will be a problem in Azure. According to a blog post from Microsoft, it is recommended that you use a GUID as a Federated key. However, they do not mention that all indexes (including cluster indexes) in Federated tables must contain a combined key. This means that all of these indexes will contain a GUID that will kill insert performance.

An alternative, apparently, is to use a centralized key generation service (as mentioned by Simon below), which has its drawbacks in terms of the potential bottleneck and central point of failure.

I would think that Microsoft would have more confidence in this, as this is certainly the problem that all federated tables that create will face!

In balance, I decided to go with a centralized key generation service, but that bothers me a bit. If someone has some kind of magical technique, I would love to hear that (or let me know if I missed something obvious)!

+7

.net azure azure-sql-database sharding

Mike hanrahan Feb 16 '12 at 10:35

source share

4 answers

When you think about your federation key, it is important to think about a key that actually leads to a good distribution among the members of the federation, so in many cases the generated identifier is not a good idea. For example, marking on the order ID will mean that all recent orders are in the last member of the federation and are likely to work for most users, so the benefits of the federation will be significantly reduced, the breakdown by country / customer ID / etc is likely to achieve scalability benefits that the federation offers.

When it comes to the unique uniqueness of a string, you need to consider that the entities will be stored in different databases, and for this reason the identity or sequence of generations is not available, see the Cihan Biyikoglu blog post about this - his recommendation is to use uniqueidentifier or datetimeoffset

+2

Yossi dahan Feb 17 '12 at 9:02

source share

In my projects, I always use the GUID for the federation key, since I don’t think this causes a serious performance problem. Maybe my project is not so huge, but it works with me. So my answer to your first question is yes.

Your next question, I think that there is an ID Generator service there, just like you thought, but yes, that could be a bottleneck. I thought if we could have an identifier pool that uses some distribution cache to store identifiers created by this service. Thus, so that someone wants to get the identifier, it will be retrieved from the pool, and not generated on demand. Thus, the ID generator will continue to push identifiers in this pool, and consumers will pull the identifier out of it. This may be useful, but then again, I have never been implemented this way, so I cannot say if this is the best practice or not.

Hope this helps.

+1

Shaun xu Feb 16 '12 at 23:21

source share

The only drawback of using the GUID as the primary key is that if the table is clustered on the primary key, this will lead to a significant breakdown of pages into inserts. This is because good GUIDs are not generated in chronological order to make them hard to guess.

Azure SQL do tables need a clustered index. My suggestion is to have a clustered index for a value based on a range (e.g. datetime) and use a nonclustered index for the primary key, which will be the GUID.

0

hocho Feb 17 '12 at 5:03

source share

Simon munro · Accepted Answer · 2012-02-17T10:30:45+0000

You can create sequences in an application using various methods, but they are not direct due to their distributed nature. A good thing is to use blob storage and prerequisites .

Depending on the schedule of your project, you can use SQL 2012 SEQUENCE and put all your sequences in a small non-federated database. SEQUENCE is not yet available on SQL Azure.

Generating an Identifier for a Private Database (Azure Federated Database)

More articles: