Lock inserting a database table from a multi-threaded application

I have a process with multiple threads.

The process has a thread-safe collection of elements for processing.

Each thread processes items from the collection in a loop.

Each list item is sent to the stored procedure by a stream to insert data into 3 tables in a transaction (in sql). If one insert fails, all three fail. Please note that transaction volume per item.

Inserts are pretty simple by simply inserting one row (associated with a foreign key) into each table with an identification seed. No reading, just paste and go to the next item.

If I have several threads trying to process their own elements, each of which tries to insert into the same set of tables, will they create locks, timeouts, or any other problems due to transaction locks?

I know that I need to use one db connection for each thread, I mainly deal with the level of locking tables in each transaction. When one thread inserts rows into 3 tables, will the remaining threads remain? There is no dependence on the rows in the table, except that auto identij should be increased. If this is a table level lock in order to increase the identifier, I suppose other threads will have to wait. Sometimes inserts may or may not be quick. If you need to wait, does it make sense to do multithreading?

The goal of multithreading is to speed up the processing of elements.

Share your experience.

PS: The identity seed is not a GUID.

+8
multithreading c # sql-server sql-server-2008
source share
4 answers

In SQL Server, multiple inserts into the same table usually do not block each other independently. The IDENTITY generation mechanism is very parallel, so it does not serialize access. The inserts can block each other if they insert the same key in a unique index (one of them will also hit the duplicate key if both attempts are made). You also have a game with probability because the keys are hashed, but they only come into play in large transactions, see %% LOCKRES %% PROBABILITY ACCORDING TO THE WAVE MARKER: 16,777,215 . If a transaction is inserted into several tables, there should also be no conflict if, again, the inserted keys do not intersect (this happens naturally if the inserts are master-child-child).

At the same time, the presence of secondary indexes and especially the restrictions of foreign keys can lead to blocking and possible deadlocks. Without a precise definition of the circuit, it is impossible to say that you are either not subject to deadlocks. Any other workload (reports, reading, maintenance) also adds to conflicting issues and can potentially cause locks and deadlocks.

Really really really high-performance deployments (a view that you don’t need to ask forums for tips ...) can suffer from the insertion of hotspots, see PAGELATCH Dispute Resolution for INSERT Highly Competitive Workloads

BTW, making INSERT from multiple threads, is very rarely the right answer to increase throughput. See the Data Download Performance Guide for tips on how to solve this problem. And the last tip: multiple threads also rarely respond to make any program faster. Asynchronous programming is almost always the right answer. See AsynchronousProcessing and BeginExecuteNonQuery .

As a note:

just inserting one row (associated with a foreign key) into each table, ... Not readable,

This statement actually contradicts itself. Foreign keys are read, as they must be verified during recording.

+4
source share

What makes you think that this should be a table-level lock, if there is an identity. I don't see this in any documentation, and I just checked the insert with (rowlock) in the table with the identity column, and it works.

To minimize locking, lock the rows. For all stored procedures, update the tables in the same order.

Do you have inserts in three tables, each up to ten seconds? I have some inserts in transactions that fall into several tables (some of them are large) and get 100 / second.

Browse the table design and its keys. If you can choose a cluster PC that represents the order of your insertion, and if you can sort before insertion, that will make a huge difference. Review the need for any other indexes. If you must have other indexes, then track fragmentation and defragmentation.

Connected, but not the same. I have a dataloader that needs to parse some data and then load millions of rows at night, but not in a transaction. It was optimized for 4 parallel processes, starting with empty tables, but the problem was that after two hours the download throughput decreased by 10 times due to fragmentation. I reworked the tables, so the PK clustering index was included in the insertion order. Discarded any other index that did not give at least a 50% choice. On a nightly insert, first remove (disable) the indexes and use only two streams. One thread to parse and one to insert. Then I recreate the index at the end of the download. Got a 100: 1 improvement over 4 threads, scoring indices. Yes, you have another problem, but check your tables. Too often, I think that indexes are added for small selection privileges without taking into account the hit for insertion and update. In addition, the choice of benefits is often evaluated as the index is created and compared, and the fresh index has no fragmentation.

+2
source share

Heavy DBMSs such as mssql are generally very good at handling concurrency. What exactly happens with your simultaneous transactions depends largely on your TI level ( http://msdn.microsoft.com/en-us/library/ms175909%28v=sql.105%29.aspx ), which you can set as you see fit, but in this case I don’t think you need to worry about deadlocks.

Whether it makes sense or not, it's always hard to guess that without knowing anything about your system. It is not difficult to try, so you can find out for yourself. If I were to guess, I would say that it will not help you if all your threads do, it is to insert rows in a circular way.

+1
source share

Other threads will wait anyway, your pc can not really execute more threads than the processor cores that you have at any given moment.
You wrote that you want to use multithreading to speed up processing. I am not sure if this is what you can take as indicated / correct automatically. The level of parallelism and its effect on processing speed depend on many factors that are very process dependent, for example, whether it is involved in IO, or if each thread should only perform memory processing. This, I think, is one of the reasons microsoft offers the task scheduler in its tpl structure and usually refers to the agreement in this library as something that needs to be set at runtime.
I think your safest bet is to run test requests / processes to see exactly what is happening (although, of course, it still will not be 100% accurate). You can also check out the optimisitc concurrency sql server features that allow you to work without blocking (im not sure how it handles identity columns, though)

0
source share

All Articles