Phantom Reading Anomalies in Oracle and PostgreSQL Doesn't Cancel Transaction

I noticed the following event in both Oracle and PostgreSQL.

Given that we have the following database schema:

create table post ( id int8 not null, title varchar(255), version int4 not null, primary key (id)); create table post_comment ( id int8 not null, review varchar(255), version int4 not null, post_id int8, primary key (id)); alter table post_comment add constraint FKna4y825fdc5hw8aow65ijexm0 foreign key (post_id) references post; 

With the following data:

 insert into post (title, version, id) values ('Transactions', 0, 1); insert into post_comment (post_id, review, version, id) values (1, 'Post comment 1', 459, 0); insert into post_comment (post_id, review, version, id) values (1, 'Post comment 2', 537, 1); insert into post_comment (post_id, review, version, id) values (1, 'Post comment 3', 689, 2); 

If I open two separate SQL consoles and follow these instructions:

 TX1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; TX2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; TX1: SELECT COUNT(*) FROM post_comment where post_id = 1; TX1: > 3 TX1: UPDATE post_comment SET version = 100 WHERE post_id = 1; TX2: INSERT INTO post_comment (post_id, review, version, id) VALUES (1, 'Phantom', 0, 1000); TX2: COMMIT; TX1: SELECT COUNT(*) FROM post_comment where post_id = 1; TX1: > 3 TX1: COMMIT; TX3: SELECT * from post_comment; > 0;"Post comment 0";100;1 1;"Post comment 1";100;1 2;"Post comment 2";100;1 1000;"Phantom";0;1 

As expected, the isolation level SERIALIZABLE saved the snapshot data from the start of transaction TX1, and TX1 sees only 3 post_comment .

Due to the MVCC model, Oracle and PostgreSQL TX2 are allowed to insert a new record and commit.

Why is TX1 allowed to commit? Since this is an anomaly in reading phantom, I expected to see that TX1 would roll back with a “serialization failure exception” or something similar.

Does the MVCC Serializable model in PostgreSQL and Oracle only guarantee isolation of snapshots, but not phantom read anomalous detection?

UPDATE

I even changed Tx1 to issue an UPDATE statement that changes the version column for all post_comment entries belonging to the same post .

Thus, Tx2 creates a new record, and Tx1 is about to commit, not knowing that a new record has been added that meets the UPDATE filtering criteria.

In fact, the only way to do this unsuccessfully on PostgreSQL is to execute the following COUNT query in Tx2 before inserting a phantom entry:

 Tx2: SELECT COUNT(*) FROM post_comment where post_id = 1 and version = 0 TX2: INSERT INTO post_comment (post_id, review, version, id) VALUES (1, 'Phantom', 0, 1000); TX2: COMMIT; 

Then Tx1 will rollback using:

 org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions Detail: Reason code: Canceled on identification as a pivot, during conflict out checking. Hint: The transaction might succeed if retried. 

Most likely, the mechanism for preventing record-distortion anomalies detected this change and transaction rollback.

Interestingly, Oracle does not seem to be bothered by this anomaly, and therefore Tx1 just succeeds. Since Oracle does not prevent a skewed recording, Tx1 compiles comments nicely.

By the way, you can run all of these examples yourself, as they are on GitHub .

+6
source share
4 answers

I like this question because it demonstrates that the definition of Phantom Read in SQL Standard shows only the effect without specifying the root cause of this data anomaly:

P3 ("Phantom"): SQL transaction T1 reads a set of rows N that satisfy some. SQL transaction T2, then executes SQL statements that generate one or more rows that satisfy the used SQL transaction T1. If the SQL transaction T1 then repeats the initial read with the same, it gets a different set of rows.

A 1995 paper Criticizing ANSI SQL Isolation Levels , by Jim Gray et al, described Phantom Read how:

P3: r1 [P] ... w2 [y in P] ... (c1 or a1) (Phantom)

It is important to note that ANSI SQL P3 prohibits inserts (and updates, according to some interpretations) to the predicate, while the definition of P3 above forbids any record that satisfies the predicate as soon as the predicate has been read - the record can be inserted, updated, or deleted.

Consequently, Phantom Reading does not mean that you can simply return the snapshot from the start of the current transaction and pretend that providing the same result for the query will protect you from the actual Phantom Read the anomaly.

In the original implementation of SQL Server 2PL (two-phase locking), returning the same result for the query implied by Predicate Locks.

Selecting a snapshot of MVCC (Multi-Version Concurrency Control) (incorrectly named Serializable in Oracle) does not actually prevent other transactions from inserting / deleting rows that meet the same filtering criteria with a query that has already been executed and return the result in the current current transaction.

For this reason, we can imagine the following scenario in which we want to apply a raise to all employees:

  • Tx1: SELECT SUM(salary) FROM employee where company_id = 1;
  • Tx2: INSERT INTO employee (id, name, company_id, salary) VALUES (100, 'John Doe', 1, 100000);
  • Tx1: UPDATE employee SET salary = salary * 1.1;
  • Tx2: COMMIT;
  • Tx1: COMMIT:

In this scenario, the CEO starts the first transaction (Tx1), therefore:

  • First, she checks the amount of all salaries in her company.
  • Meanwhile, the HR department is executing a second transaction (Tx2) because they just hired John Doe and gave him $ 100,000 in salary.
  • The CEO decides that an increase of 10% is possible taking into account the total amount of salaries, not suspecting that the amount of salary has increased by $ 100 thousand.
  • Meanwhile, transaction Th2 HR is being executed.
  • The transaction of CEO Tx1 is executed.

Boom! The CEO decided on the old snapshot, giving a raise that cannot be supported by the current updated salary budget.

You can view a detailed explanation of this use case (with lots of diagrams) in the following message .

Is it a Phantom Read or Write Skew ?

According to Jim Gray et al. , This is Phantom Reading, since Write Skew is defined as:

A5B Write Skew Suppose that T1 reads x and y, which are consistent with C (), and then T2 reads x and y, writes x, and commits. Then T1 writes y. If there was a restriction between x and y, this could be violated. In terms of history:

A5B: r1 [x] ... r2 [y] ... w1 [y] ... w2 [x] ... (c1 and c2 occur)

In Oracle, the transaction manager may or may not have detected the anomaly above because it does not use predicate locks or index range locks (locks with the following key) , like MySQL.

PostgreSQL manages to catch this anomaly only if Bob issues a message with a table of employees, otherwise this phenomenon will not be prevented.

UPDATE

At the beginning, I assumed that Serializability would also mean the order of time. However, since it is very well explained by Peter Bailis , wall- mounted ordering or linearizability is allowed only for strict Serializability.

Therefore, my assumptions were made for the Serializable string system. But that is not what Serializable offers. The Serializable isolation model makes no guarantees with respect to time, and operations can be reordered if they are equivalent to some serial execution.

Therefore, according to the definition of Serializable, such a Phantom Read can occur if the second transaction does not produce any read. But in the Strict Serializable model offered by 2PL, Phantom Read will be prevented even if the second transaction fails to read against the same records that we are trying to protect from Phantom.

+3
source

What you are observing is not readable by phantom. This will happen if a new line appears when the request is issued a second time (phantoms appear unexpectedly).

You are protected from phantom reading in Oracle and PostgreSQL with SERIALIZABLE isolation.

The difference between Oracle and PostgreSQL is that the SERIALIZABLE isolation SERIALIZABLE in Oracle only provides isolation of snapshots (which is good enough to support phantoms), while in PostgreSQL this guarantees true serializability (i.e. there is always serialization of SQL statements, which leads to the same results). If you want to get the same thing in Oracle and PostgreSQL, use REPEATABLE READ isolation in PostgreSQL.

+3
source

The Postgres documentation defines phantom read as:

The transaction re-executes the query, returning a set of rows that satisfy the search condition and finds that the set of rows satisfying the condition has changed due to another recently committed transaction.

Since your choice returns the same value both before and after another transaction, it does not meet the criteria for phantom read.

0
source

I just wanted to point out that Vlad Mikhalchey’s answer is wrong.

Is it a Phantom Read or Write Skew?

None of them - there are no anomalies, transactions are serialized as Tx1 → Tx2.

Standard SQL states: "Serializable execution is defined as execution of simultaneous execution of SQL transactions that produce the same effect as sequential execution of the same SQL transactions."

PostgreSQL manages to catch this anomaly only if Bob issues a message with a table of employees, otherwise this phenomenon will not be prevented.

PostgreSQL’s behavior is 100% correct; it simply “flips” the visible transaction order.

0
source

All Articles