What should I consider when choosing a data type for my primary key?

When I create a new database table, what factors should be considered when choosing a primary key data type?

+6
sql database-design
source share
16 answers

Sorry, but I found that the answers I gave are related questions (you can check this and this ). I changed them a little ...

You will find many posts devoted to this problem, and each choice you make has its pros and cons. The arguments for them usually relate to the theory of relational databases and database performance.

On this issue, my point of view is very simple: surrogate primary keys ALWAYS work , and Natural keys CANNOT ALWAYS work these days , and this is for several reasons: the field is too short, rule changes, etc.

At this point, you guessed that I was mainly a member of the uniqueIdentifier / surrogate primary key team, and even if I appreciate and understand arguments like the ones presented here, I'm still looking for a case where a ā€œnaturalā€ key is better than a surrogate .. .

In addition to this, one of the most important but always forgotten arguments in favor of this basic rule is related to code normalization and performance :

every time i create a table i have to lose time

  • indicating its primary key and its physical characteristics (type, size)
  • remembering these characteristics every time I want to access it in my code?
  • explaining my choice of PK to other developers on the team?

My answer does not match all of these questions:

  • I don’t have time to lose trying to determine the ā€œbest natural primary keyā€ when the surrogate option gives me a bulletproof solution.
  • I don’t want to remember that the main key of my table_whatever is a string 10 characters long when I write the code.
  • I don’t want to waste time linking the length of the natural key: ā€œOK, if you need 10, why don't you take 12 to be on the safe side ? This ā€œsafe sideā€ argument annoys me very much: if you want to stay on the safe side, it means that you are really close to the unsafe side! Choose a surrogate: it is bulletproof!

So, I have been working for the last five years with a very simple rule: each table (let it be called myTable) has its first field called 'id_MyTable' , which has a unique identifier type. Even if this table maintains a many-to-many relationship, where the combination of fields offers a very acceptable Primary Key, I prefer to create this field 'id_myManyToManyTable' as a unique identifier, just stick to this rule, and therefore, finally, it does not hurt.

The main advantage is that you no longer have to worry about using the Primary Key and / or foreign key in your code. When you have a table name, you know the name and type of PK. Once you know what references are implemented in your data model, you will find out the name of the available foreign keys in the table.

And if you still want to have your "Natural Key" somewhere in your table, I advise you to build it using a standard model, for example

 Tbl_whatever id_whatever, unique identifier, primary key code_whatever, whateverTypeYouWant(whateverLengthYouEstimateTheRightOne), indexed ..... 

If id_ is the prefix for the primary key, and code_ is used for the "natural" indexed field. Some argue that the code_ field should be set as unique. This is true, and it can be easily controlled either using DDL or using external code. Please note that many "natural" keys are calculated (account numbers), so they are already generated using the code

I'm not sure my rule is the best. But it is very effective! If everyone used this, we would, for example, avoid time wasting an answer to this question!

+11
source share

If you use a digital key, make sure the giong data type is large enough to hold the number of rows that the table can expect to grow.

When using a guiding device, is it necessary to take into account the extra space required to store the guiding? Will coding against guid PK hurt developers or users of the application.

If you use compound keys, are you sure that the joined columns are always unique?

+7
source share

I don’t like what they learn at school using a "natural key" (for example, ISBN in a book database) or even having a primary key consisting of 2 or more fields. I will never do it. So here is my little tip:

  • You always have one highlighted column in each table for your primary key.
  • All of them must have the same colomn name in all tables, that is, "ID" or "GUID"
  • Use a GUID when you can (if you don't need performance), otherwise increase INTs

EDIT:
Well, I think I need to explain my options a bit.

  • Having a dedicated column denoting the same table in all tables for your primary key just makes it easier to create and simplify your SQL statements for someone else (who may not be familiar with the layout of the database) to understand. Especially when you do a lot of JOINS and the like. You will not need to look for what the primary key is for a particular table, you already know, because it is the same everywhere.

  • GUIDs vs INTs are not really that important in most cases. If you do not press GUID performance limitation or do not merge the databases, you will not have serious problems with this or that. BUT there is a reason why I prefer a GUID. The global uniqueness of a GUID may someday come in handy. You may not see the need for this now, but things like synchronizing parts of a database with a laptop / cell phone or even searching for data without having to know which table they are in are great examples of the benefits that the GUIDs can provide . Integer only identifies the record in the context of a single table, while the GUID identifies the record everywhere.

+7
source share

In most cases, I use the int primary identifier key if the script does not require a lot of replication, in which case I can choose the GUID.

I (almost) never used meaningful keys.

+6
source share

If you do not have an ultra-convenient natural key, always use a synthetic (aka surrogate) key of a numeric type. Even if you have a natural key, you can use the synthetic key anyway and add an additional unique index to your natural key. Think about what happened with more issued databases that used social security numbers as PCs, when federal law changed, the cost of switching to synthetic keys was huge.

In addition, I must disagree with the practice of naming each primary key in the same way, for example. "I'd". This makes query understanding more difficult, not simpler. Primary keys should be named after the table. For example, employee.employee_id, affiliate.affiliate_id, user.user_id, etc.

+4
source share

Do not use the floating point number type, since floating point numbers cannot be matched correctly for equality.

+2
source share
  • Where do you create it? An incremental number is not suitable for keys generated by the client.
    • Do you want a data-independent or independent key (sometimes you can use IDs from business data, you can’t say whether it is always useful or not)?
    • How well can this type be indexed by your database?

I used unique identifiers (GUIDs) or incrementing integers.

Cheers Matthias

+1
source share

Numbers that make sense in the real world are usually bad ideas because every so often real world changes the rules on how these numbers are used, in particular, to duplicate, and then you have a real mess at your fingertips.

+1
source share

I usually always use an integer, but here's an interesting perspective.

http://www.codinghorror.com/blog/archives/000817.html

0
source share

I cannot use the generated integer key. If you expect the database to be very large, you can go with bigint.

Some people like to use hints. The pro is that you can combine multiple database instances without changing any keys, but con can affect performance.

0
source share

For a "natural" key, any data type is suitable for the column (s). Artificial keys (surrogate) are usually integers.

0
source share

It all depends.

a) Do you have unique consecutive numeric numbers as your primary key? If so, then choosing UniqueIdentifier as the primary key will suffice. b) If your business demand is such that you need to have an alphanumeric primary key, you need to switch to varchar or nvarchar.

These are two options that I could think of.

0
source share

An excellent factor is how much data you are going to store. I work for a web analytics company and we have LOADS of data. So the primary GUID in our pageview table will kill us due to size.

Rule of thumb: For high performance, you should be able to store your entire index in memory. Guides could easily break it!

0
source share

Use natural keys when you can trust them. Some sources of natural keys cannot be trusted. A few years ago, the social security administration sometimes beat the appointment of the same SSN to two different people. They probably already fixed it.

You can probably trust VIN for vehicles and ISBN for books (but not for pamphlets that may not have ISBN).

If you use natural keys, the natural key will determine the data type.

If you cannot trust natural keys, create a synthetic key. I prefer integers for this purpose. Leave enough room for reasonable expansion.

0
source share

I usually use the primary key of a GUID column for all tables (rowguid in mssql). What can be natural keys, I make unique restrictions. A typical example is the product identification number, which the user must draw up and make sure that it is unique. If I need a sequence, for example, on an invoice, I create a table to store the last number and stored procedure to ensure sequential access. Or Consistency at Oracle :-) I hate the "social security number" pattern for natural keys, as this number will never be available during the registration process. The result is the need to create a circuit to create dummy numbers.

0
source share

Whenever possible, try to use a primary key, which is a natural key. For example, if I had a table in which I registered one record every day, login would be a good primary key. Otherwise, if there is no natural key, just use int. If you think you will use more than 2 billion lines, use bigint. Some people like to use a GUID that works well, as they are unique and you will never have a run. However, they are unnecessarily long and difficult to type if you are just doing adhoc requests.

-one
source share

All Articles