SELECT FOR UPDATE vs. UPDATE, then SELECT

I created a service application that uses multithreading for parallel processing of data located in the InnoDB table (about 2-3 million records and no more than InnoDB-related queries made by the application). Each thread makes the following queries to the specified table:

  • INITIAL OPERATION
  • SELECT FOR UPDATE (SELECT pk FROM table WHERE status = 'new' LIMIT 100 FOR UPDATE)
  • UPDATE (UPDATE table SET status = 'locked' WHERE pk BETWEEN X AND Y)
  • COMMIT
  • DELETE (DELETE FROM WHERE pK BETWEEN X AND Y table)

The guys from forum.percona.com gave me some tips - do not use SELECT FOR UPDATE and UPDATE because of the longer time it takes to complete the transaction (2 queries), and waiting for the lock to wait. Their advice was (auto-commit enabled):

  • UPDATE (table UPDATE SET status = 'locked', thread = Z LIMIT 100)
  • SELECT (SELECT pk FROM table WHERE thread = Z)
  • DELETE (DELETE FROM WHERE pK BETWEEN X AND Y table)

and that was supposed to improve performance. However, instead, I got even more dead ends and expectations of blocking timeouts than before ...

I read a lot about InnoDB optimization and configured the server correctly, so InnoDB settings are 99% fine. This fact is also confirmed by the first scenario, which works perfectly and better than the second. My.cnf file:

innodb_buffer_pool_size = 512M innodb_thread_concurrency = 16 innodb_thread_sleep_delay = 0 innodb_log_buffer_size = 4M innodb_flush_log_at_trx_commit=2 

Any ideas why the optimization was not successful?

+6
performance mysql innodb
source share
1 answer

What I understand from the description of your process:

  • You have a table with many rows that need to be processed.
  • You select a row from this table (using for updating) so that other threads cannot access the same row.
  • When you are done, you will update the row and commit the transaction.
  • And then remove the row from the database.

If so, then you are doing the right thing, as it will be less blocking, and then the second approach that you talked about.

You can reduce the lock conflict by deleting the delete statement, as this will lock the entire table. Instead, add a flag (a new column named processed) and update it. And delete the lines at the end when all the threads are processed.

You can also make the layout of the work intelligent by batch loading the workload - in your case, the range of lines (maybe PK) that will be processed by each thread - in this case you can make a simple choice and do not need a FOR UPDATE clause and it will work fast.

+2
source share

All Articles