Can multiple threads cause recurring updates in a limited set?

Question

Can multiple threads cause recurring updates in a limited set?

In postgres, if I run the following statement

update table set col = 1 where col = 2

At the READ COMMITTED isolation level from multiple concurrent sessions, I am sure that:

In the case of a single match, only 1 thread will receive a ROWCOUNT of 1 (which means only one thread)
In the case of multiple matches, only 1 stream will receive ROWCOUNT> 0 (this means that only one stream writes a packet)

+6

postgresql transactions isolation-level

Sam saffron Aug 11 '12 at 12:42

source share

1 answer

Craig ringer · Accepted Answer · 2012-08-11T13:29:57+0000

Your stated warranties apply in this simple case, but not necessarily in slightly more complex requests. See End of Answer for examples.

Simple case

Assuming col1 is unique, has exactly one "2" value, or has a stable order, so each UPDATE matches the same lines in the same order:

What will happen for this request is that the threads will find the line with col = 2 and everyone will try to capture the write lock on this tuple. It is one of them that will succeed. The rest blocks waiting for the transaction to complete the first thread.

This first tx will write, commit, and return the line with number 1. The end will release the lock.

The other tx will try to capture the lock again. One by one, they will succeed. Each transaction, in turn, will go through the following process:

Get a write lock on the disputed tuple.
Repeat checking the status WHERE col=2 after receiving the lock.
Re-checking will show that the condition no longer matches, so UPDATE skip this line.
UPDATE has no other rows, so it will report an update on null rows.
Lock by releasing the lock for the next tx, trying to hold it.

In this simple case, row-level locking and re-checking conditions efficiently serialize updates. In more complex cases, not so much.

You can easily demonstrate this. Open speak four psql sessions. In the first case, lock the table with BEGIN; LOCK TABLE test; BEGIN; LOCK TABLE test; ^* . In other sessions, identical UPDATE is performed - they will be blocked when locking at the table level. Now release the COMMIT ting lock of your first session. Watch them race. Only one will report the number of lines in 1, the rest will report 0. This is easily automated and a script is created to repeat and scale to more connections / threads.

To learn more, read the rules for concurrent writing , page 11 of PostgreSQL concurrency - and then read the rest of this presentation.

And if col1 is not unique?

As Kevin noted in the comments, if col not unique, so you can match multiple strings, then different UPDATE executions can have different orders. This can happen if they choose different plans (for example, one of them is through PREPARE and EXECUTE , and the other is direct, or you enable_ with enable_ GUC), or if the plan they use uses unstable sorting of equal values. If they receive rows in a different order, then tx1 will block one tuple, tx2 will block another, then each of them will try to get locks for each other's tuples already locked. PostgreSQL will interrupt one of them with the exception of the deadlock. This is another good reason why all your database code should always be ready for re-transactions.

If you try to make sure that the parallel UPDATE always gets the same rows in the same order, you can still rely on the behavior described in the first part of the answer.

Disappointingly, PostgreSQL does not offer UPDATE ... ORDER BY , so make sure your updates always select the same rows in the same order, which is not as simple as you might wish. A SELECT ... FOR UPDATE ... ORDER BY followed by a separate UPDATE is the most secure.

More complex queries, queuing systems

If you execute queries with several steps, using several tuples or conditions other than equality, you may get unexpected results that differ from the results of serial execution. In particular, parallel runs of any type:

 UPDATE test SET col = 1 WHERE col = (SELECT t.col FROM test t ORDER BY t.col LIMIT 1);

or other efforts to create a simple queue system will * fail * work as you expect. See PostgreSQL docs on concurrency and this presentation for more information.

If you need a work queue supported by a database, there are well-tested solutions that handle all surprisingly complex corner cases. One of the most popular is PgQ . There is a useful PgCon related article and a Google search for the 'postgresql queue' contains useful results.

^* BTW, instead of LOCK TABLE you can use SELECT 1 FROM test WHERE col = 2 FOR UPDATE; to get write lock only on the tuple. This blocks updates against it, but does not block writing to other tuples or blocks any reads. This allows you to simulate various concurrency problems.

Can multiple threads cause recurring updates in a limited set?

Simple case

And if col1 is not unique?

More complex queries, queuing systems

More articles: