Is there a way to multithread SqlDataReader?

I have an Sql query that returns me more than half a million rows for processing ... The process does not take very long, but I would like to speed it up a bit with some multiprocessing. Given the code below, is multithreaded something like this possible?

using (SqlDataReader reader = command.ExecuteReader()) { while (reader.Read()) { // ...process row } } 

It would be ideal if I could just get the cursor at the beginning and in the middle of the list of results. That way, I could process the records with two threads. However, SqlDataReader does not allow me to do this ...

Any idea how I could achieve this?

+6
performance multithreading sql sql-server
source share
3 answers

Set up the producer / consumer queue, while one producer process will pull out the reader and the queue from the records as quickly as possible, but does not perform β€œprocessing”. Then a number of processes (how much you want, depends on your system) to delete and process each entry in the queue.

+6
source share

You do not have to read many lines on the client.

At the same time, you can split your request into several requests and execute them in parallel. This means running multiple SqlCommands in separate threads and disabling them in each section of the result. The A + question is how to split the result, and it depends a lot on your data and your query:

  • You can use a range of keys (e.g. ID betweem 1 and 10000 , ID between 10001 and 20000 , etc.)
  • You can use an attribute (e.g. RecordTypeID IN (1,2) , RecordTypeID IN (3,4) , etc.)
  • You can use a synthetic range (i.e. ROW_NUMBER() BETWEEN 1 and 1000 etC), but this is very problematic for attracting the right
  • You can use a hash (e.g. BINARY_CHECKSUM(*)%10 == 0 , BINARY_CHECKSUM(*)%10==1 , etc.)

You just need to be very careful that partition requests do not overlap and block at run time (i.e., scan the same records and acquire X locks), thereby serializing each other.

+3
source share

Is this a simple ranked query like WHERE Id from 1 to 500000? If so, you can simply start N queries, each of which returns 1 / N of the range. But it helps to know where you are narrow gauges with a single-threaded approach. If you are doing continuous reads from a single drive in order to execute a request, you should probably stick to a single thread. If it is divided by spindles into a certain range, you can intelligently configure your queries to maximize throughput from the disk (i.e. Read from each disk in parallel with individual requests). If you expect all rows to be in memory, then you can parallelize as you wish. But if the request is more complex, you cannot easily break it down without pressing overhead. In most cases, the above options will not be applied well, and the producer / consumer mentioned by Joel will be the only place for parallelization. Depending on how much time you spend processing each line, this can only provide trivial wins.

0
source share

All Articles