Will UUID as a primary key in PostgreSQL give poor index performance?

I created an application in Rails on Heroku using a PostgreSQL database.

It has a couple of tables designed for synchronization with mobile devices, where data can be created in different places. Therefore, I have a uuid field, which is a string storing a GUID in addition to the auto-add primary key. Uuid is the one that is transmitted between the server and clients.

I understood after implementing the synchronization mechanism on the server side, which leads to performance problems if it is necessary to compare between uuid ↔ id all the time (when writing objects, I need to request uuid to get the identifier before saving and vice versa when sending data).

Now I'm thinking of switching to using the UUID as the primary key, which makes writing and reading a lot easier and faster.

I read that a UUID as a primary key can sometimes give poor index performance (index fragmentation) when using a clustered primary key index. Does PostgreSQL suffer from this problem or is it ok to use the UUID as the primary key?

I already have a UUID column today, so keeping wise will be better because I drop the regular id column.

+50
ruby ruby-on-rails postgresql heroku
Oct 30 '12 at 19:05
source share
2 answers

(I'm working on Postgres Heroku)

We use UUIDs as primary keys on several systems, and it works great.

I recommend that you use the uuid-ossp , and even if postgres generate UUIDs for you:

 heroku pg:psql psql (9.1.4, server 9.1.6) SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) Type "help" for help. dcvgo3fvfmbl44=> CREATE EXTENSION "uuid-ossp"; CREATE EXTENSION dcvgo3fvfmbl44=> CREATE TABLE test (id uuid primary key default uuid_generate_v4(), name text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE dcvgo3fvfmbl44=> \d test Table "public.test" Column | Type | Modifiers --------+------+------------------------------------- id | uuid | not null default uuid_generate_v4() name | text | Indexes: "test_pkey" PRIMARY KEY, btree (id) dcvgo3fvfmbl44=> insert into test (name) values ('hgmnz'); INSERT 0 1 dcvgo3fvfmbl44=> select * from test; id | name --------------------------------------+------- e535d271-91be-4291-832f-f7883a2d374f | hgmnz (1 row) 

EDIT Performance Indicators

It will always depend on your workload.

Integer primary key has the advantage of locality where similar data converges. This can be useful, for example, for range type queries such as WHERE id between 1 and 10000 , although the lock conflict is worse.

If your read workload is completely random, since you always perform the primary key search, there should be no measurable performance degradation: you pay only for the larger data type.

You write a lot in this table, and this table is very large? Perhaps, although I did not measure this, there are consequences for maintaining this index. For many datasets, UUIDs are just fine, and using UUIDs as identifiers has some nice features.

Finally, I cannot be the most qualified person to discuss or advise on this, since I never run a table large enough from the UUID PC, where this has become a problem. YMMV. (Having said that, I would like to hear about people who are having problems with this approach!)

+55
Oct 30
source share
β€” -

As accepted response states, range requests may be slow in this case, but not only on id .

Auto-increment is naturally sorted by date, so when auto-increment is used, the data is stored chronologically on the disk (see B-Tree), which speeds up reading (does not look for HDD). For example, if you tell all users that the natural order will be created by a date that will be the same as auto-increment, and therefore range queries will be faster on hard drives, and on SSDs, I believe, the difference will be insignificant, since SSD files are design is always random access (without searching for the head, without the participation of mechanical parts, clean electricity)

+1
Aug 23 '16 at 15:33
source share



All Articles