Parallel loop. Freezes cycle

Question

Parallel loop. Freezes cycle

I'm trying to add some information to Parallel in the DataTable, but if the loop is long, it freezes or just takes a lot of time, more time than a regular loop, this is my code for Parallel. For the loop:

Parallel.For(1, linii.Length, index => { DataRow drRow = dtResult.NewRow(); alResult = CSVParser(linii[index], txtDelimiter, txtQualifier); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } );

What happened? this Parallel.For loop takes much longer than usual, what’s wrong?

Thanks!

+4

c # parallel-processing datatable

Xandruu Aug 9 '12 at 12:53

source share

2 answers

 Parallel.For(1, linii.Length, index => { alResult = CSVParser(linii[index], txtDelimiter, txtQualifier); lock (dtResult) { DataRow drRow = dtResult.NewRow(); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } });

+1

Amiram korach Aug 9 '12 at 13:01

source share

Marc gravell · Accepted Answer · 2012-08-09T13:02:59+0000

You cannot mutate a DataTable from two different streams; it will be . DataTable does not attempt to be thread safe. So: do not do this. Just do it from one thread. Most likely you are limited by IO, so you just need to do this in one thread as a thread. It looks like you are processing text data. It seems you have string[] for strings, perhaps File.ReadAllLines() ? Well, this is very bad:

it forces everything to load into memory
you need to wait for all this to load into memory
CSV is a multi-line format; it is not guaranteed that 1 line == 1 line

What you need to do is use something like CsvReader from the code project, but even if you just want to use one line at a time, use StreamReader:

 using(var file = File.OpenText(path)) { string line; while((line = file.ReadLine()) != null) { // process this line alResult = CSVParser(line, txtDelimiter, txtQualifier); for (int i = 0; i < alResult.Count; i++) { drRow[i] = alResult[i]; } dtResult.Rows.Add(drRow); } }

It will not be faster using Parallel , so I have not tried to do this. IO is your bottleneck here. Locking will be an option, but it won’t help you en masse.

As irrelevant, I noticed that alResult not declared inside the loop. This means that in your alResult source code alResult is a captured variable that is shared between all iterations of the loop, which means that you are already rewriting each line horribly.

Edit: illustration why Parallel not relevant for reading 1,000,000 lines from a file:

Approach 1: use ReadAllLines to load strings, then use Parallel to process them; it costs [fixed time] for the physical IO file, and then we parallelize. The processor is minimal, and we mostly spent [fixed time]. However, we added a lot of thread overhead and memory overhead, and we couldn't even run until the whole file was downloaded.

Approach 2: using the streaming API; read each line by line - process each line and add it. The cost here is basically again: [fixed time] for the actual I / O bandwidth for downloading the file. But; now we have no overhead flows, no synchronization conflicts, no huge memory for distribution, and we immediately begin to fill out the table.

Approach 3: if you really wanted to, the third approach would be a read / write queue, with one dedicated IO processing file and line nesting, and the second with a DataTable . Honestly, these are much more mobile parts, and the second thread will spend 95% of its time waiting for data from the file; stick to approach 2!

Parallel loop. Freezes cycle

More articles: