Parallel loop. Freezes cycle

I'm trying to add some information to Parallel in the DataTable, but if the loop is long, it freezes or just takes a lot of time, more time than a regular loop, this is my code for Parallel. For the loop:

Parallel.For(1, linii.Length, index => { DataRow drRow = dtResult.NewRow(); alResult = CSVParser(linii[index], txtDelimiter, txtQualifier); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } ); 

What happened? this Parallel.For loop takes much longer than usual, what’s wrong?

Thanks!

+4
source share
2 answers

You cannot mutate a DataTable from two different streams; it will be . DataTable does not attempt to be thread safe. So: do not do this. Just do it from one thread. Most likely you are limited by IO, so you just need to do this in one thread as a thread. It looks like you are processing text data. It seems you have string[] for strings, perhaps File.ReadAllLines() ? Well, this is very bad:

  • it forces everything to load into memory
  • you need to wait for all this to load into memory
  • CSV is a multi-line format; it is not guaranteed that 1 line == 1 line

What you need to do is use something like CsvReader from the code project, but even if you just want to use one line at a time, use StreamReader:

 using(var file = File.OpenText(path)) { string line; while((line = file.ReadLine()) != null) { // process this line alResult = CSVParser(line, txtDelimiter, txtQualifier); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } } 

It will not be faster using Parallel , so I have not tried to do this. IO is your bottleneck here. Locking will be an option, but it won’t help you en masse.

As irrelevant, I noticed that alResult not declared inside the loop. This means that in your alResult source code alResult is a captured variable that is shared between all iterations of the loop, which means that you are already rewriting each line horribly.


Edit: illustration why Parallel not relevant for reading 1,000,000 lines from a file:

Approach 1: use ReadAllLines to load strings, then use Parallel to process them; it costs [fixed time] for the physical IO file, and then we parallelize. The processor is minimal, and we mostly spent [fixed time]. However, we added a lot of thread overhead and memory overhead, and we couldn't even run until the whole file was downloaded.

Approach 2: using the streaming API; read each line by line - process each line and add it. The cost here is basically again: [fixed time] for the actual I / O bandwidth for downloading the file. But; now we have no overhead flows, no synchronization conflicts, no huge memory for distribution, and we immediately begin to fill out the table.

Approach 3: if you really wanted to, the third approach would be a read / write queue, with one dedicated IO processing file and line nesting, and the second with a DataTable . Honestly, these are much more mobile parts, and the second thread will spend 95% of its time waiting for data from the file; stick to approach 2!

+5
source
 Parallel.For(1, linii.Length, index => { alResult = CSVParser(linii[index], txtDelimiter, txtQualifier); lock (dtResult) { DataRow drRow = dtResult.NewRow(); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } }); 
+1
source

All Articles