How can you automatically increment a non-primary identifier column bound to another table?

Suppose we are building a GitHub and we have two tables: repos and issues . Each GitHub repository has a set of problems, so the issues table has a foreign key repo_id .

Now that you are viewing the GitHub repo issues, you do not want them to appear in the internal id . Instead, you need something like number , which increments from 1..n only for this repository. You want your first problem in your new repo to be numbered 1 , not the next id for the problem with GitHub.

Of course, you need a way to increase, and you want to make sure that number is unique when binding to a repo. And therefore, you especially want to avoid any race conditions where the same number can be generated twice.

What is the easiest way to handle this? Trigger? Is something else completely?

I use PostgreSQL, but prefer approaches that are vanilla SQL where possible, for example. triggers. If there is a simpler Postgres approach then this will also be useful.

Any code demonstrating your approach will be extremely helpful. Thanks!

+6
source share
4 answers

I do not think there is a way to do this without the possibility of a race. This should minimize race conditions, but not eliminate them. There may be better ways in certain database architectures. Assuming "REPOSITORY_ID" is provided by your application code:

 insert into issues (repo_id,line_id) values ( REPOSITORY_ID, coalesce((select max(line_id)+1 from issues where repo_id=REPOSITORY_ID),0) ); 

This draws the current line_id height and increases it during insertion. If there are no entries, by default it is 0. There is a small chance of a race if two inserts fall at the same time, but this seems unlikely. If you use uniqueness, you can check for errors during insertion and retry on failure.

+1
source

Suppose you want to add a new issue to a specific repo , you can perform the following operations:

  • start a new explicit transaction;
  • select the repo you want to change with SELECT ... FOR UPDATE . This will result in row-level locking and prevent other transactions that want to add a new issue for this repo in order to continue at the same time;
  • somehow get the new version number for this repo (for example, you could have the latest_issue column in issue , as in one of the answers, or you could run a query to find it);
  • insert a new issue with the correct issue number;
  • abort transaction: this will release the lock and allow other transactions that want to run on the same repo to continue.

That way, you can define the stored procedure this way and call it every time you want to insert a new issue . According to the hypothesis that there are not many concurrent transactions trying to insert new problems for the same repository, this would prevent race conditions and still work with reasonable efficiency.

+1
source

I would save the latest_issue column in repos and initialize it to 0.
In issues I would create an nr column with a UNIQUE constraint of (repo_id, nr) , where repo_id is the foreign key column in issues .

Whenever a problem is created, the latest_issue in the repos incremented. This number is then used as nr for issues .

0
source

Here's how I would handle it:

 create table repo( id serial primary key ); create table issue( id integer not null, id_repo integer not null references repo(id), primary key (id, id_repo) ); create function create_issue_seq() returns trigger as $$ begin execute format('create sequence issue_%s_seq', new.id); return new; end $$ language plpgsql; create trigger create_issue_seq after insert on repo for each row execute procedure create_issue_seq(); create function assign_issue_id() returns trigger as $$ begin new.id = nextval(format('issue_%s_seq', new.id_repo)); return new; end $$ language plpgsql; create trigger assign_issue_id before insert on issue for each row execute procedure assign_issue_id(); 

One trigger creates a sequence of error identifiers after creating the repo (allocated for this repo), and the second uses the existing allocated sequence to correctly populate the problem identifier before the problem is inserted.

Pros:

  • non-racing as it actually uses sequences
  • no exclusive lock required

Minuses:

  • it creates many sequences (although a quick check with a 1M repository has shown that it does not cause serious performance losses).

Notes:

  • It is probably advisable to also implement a trigger that deletes the sequence after deletion to the repo.
  • I assumed an immutable repo id (which I strongly believe is the right way: to have an immutable PK)
0
source

All Articles