I have a system that has a complex primary key for interacting with external systems and a quick, small, opaque primary key for internal use. For example: a foreign key can be a compound value β something like (first name (varchar), last name (varchar), zip code (char)), and the internal key will be an integer ("customer ID").
When I receive an incoming request with a foreign key, I need to look for the internal key - and the hard part here is to assign a new internal key if I do not already have one for this foreign ID.
Obviously, if I have only one client that is talking to the database at a time, this is normal. SELECT customer_id FROM customers WHERE given_name = 'foo' AND ... , then INSERT INTO customers VALUES (...) if I do not find the value. But, if there are potentially many requests coming from external systems at the same time, and many of them can come to a previously unheard of client at the same time, there is a race condition when several clients can try INSERT new line.
If I were modifying an existing line, that would be easy; just SELECT FOR UPDATE first to get the corresponding row level lock before doing UPDATE . But in this case, I do not have a row that I can block, because the row does not exist yet!
So far I have come up with several solutions, but each of them has quite serious problems:
- Catch the error on
INSERT , retry the entire transaction from above. This is a problem if the transaction includes a dozen clients, especially if the incoming data potentially speaks of the same clients in a different order each time. This can get stuck in mutually recursive deadlock cycles, where each time a conflict arises with another client. You can mitigate this by exponentially waiting between retries, but this is a slow and costly way to deal with conflicts. In addition, it is quite difficult to complicate the application code, since everything must be reloaded. - Use savepoints. Run the savepoint until
SELECT , catch the error on INSERT , and then go back to the savepoint and SELECT . Savepoints are not fully portable, and their semantics and capabilities differ slightly and subtly between databases; the biggest difference I noticed is that sometimes they seem to nest, and sometimes they donβt, so it would be nice if I could avoid them. This is only a vague impression, although - is it inaccurate? Are storage points standardized, or at least practically constant? In addition, savepoints make it difficult to execute a single transaction in parallel, because you cannot say exactly how much work you will roll back, although I understand that I just need to live with it. - Get some global lock, for example, table-level locking using the LOCK statement ( oracle mysql postgres ). This obviously slows down these operations and leads to a lot of lock conflicts, so I would rather avoid this.
- Get a finer, but database-specific lock. I am only familiar with the Postgres method of this , which is definitely not supported in other databases (functions even start with "
pg_ "), and again this is a portability problem. Also, a postgres way to do this would require me to convert the key to a pair of integers anyway, which it might not match. Is there a better way to get locks for hypothetical objects?
It seems to me that this was a common problem with the w390 databases, but I could not find many resources on it; perhaps only because I donβt know the canonical phrase. Is it possible to do this with some simple extra bit of syntax in any of the marked databases?
Glyph
source share