Writing to a file with multiple C # streams

Question

Writing to a file with multiple C # streams

I am trying to upload a large file (> 1 GB) from one server to another using HTTP. To do this, I am making HTTP range requests in parallel. This allows me to upload a file in parallel.

When saving to disk, I take each response stream, opening the same file as a file stream, aiming for the range that I want, and then writing.

However, I find that all but one of my response streams expire. It seems that disk I / O cannot keep up with network I / O. However, if I do the same, but each stream writes to a separate file, it works fine.

For reference, here is my code entry in the same file:

int numberOfStreams = 4; List<Tuple<int, int>> ranges = new List<Tuple<int, int>>(); string fileName = @"C:\MyCoolFile.txt"; //List populated here Parallel.For(0, numberOfStreams, (index, state) => { try { HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL"); using(Stream responseStream = webRequest.GetResponse().GetResponseStream()) { using (FileStream fileStream = File.Open(fileName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write)) { fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin); byte[] buffer = new byte[64 * 1024]; int bytesRead; while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0) { if (state.IsStopped) { return; } fileStream.Write(buffer, 0, bytesRead); } } }; } catch (Exception e) { exception = e; state.Stop(); } });

And here is the code that writes several files:

 int numberOfStreams = 4; List<Tuple<int, int>> ranges = new List<Tuple<int, int>>(); string fileName = @"C:\MyCoolFile.txt"; //List populated here Parallel.For(0, numberOfStreams, (index, state) => { try { HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL"); using(Stream responseStream = webRequest.GetResponse().GetResponseStream()) { using (FileStream fileStream = File.Open(fileName + "." + index + ".tmp", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write)) { fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin); byte[] buffer = new byte[64 * 1024]; int bytesRead; while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0) { if (state.IsStopped) { return; } fileStream.Write(buffer, 0, bytesRead); } } }; } catch (Exception e) { exception = e; state.Stop(); } });

My question is, are there any additional checks / actions that C # / Windows takes when writing to a single file from multiple streams, which will cause file I / O to be slower than when writing to multiple files? Should all disk operations be related by the speed of the disk to the right? Can anyone explain this behavior?

Thanks in advance!

UPDATE: Here is the error that the source server is starting:

"Cannot write data to the transport connection: the connection attempt failed because the connected party did not respond properly after some time or the connection failed because the connected host was unable to respond." [System.IO.IOException]: "Unable to write data to the transport connection: the connection attempt failed because the connected party did not respond properly after some time or the connection failed because the connected host was unable to respond." InnerException: "The connection attempt failed because the connected party did not respond properly after some time or the connection could not be established because the connected host was unable to respond" Message: "Unable to write data to the transport connection: connection attempt failed because the related party didn’t respond properly after some time or the connection failed because the connected host was unable to respond. " StackTrace: "in System.Net.Sockets.NetworkStream.Write (Byte [] buffer, Int32 offset, Int32 size) \ r \ n in System.Net.Security._SslStream.StartWriting (Byte [] buffer, Int32 offset, Int32 count , AsyncProtocolRequest asyncRequest) \ r \ n in System.Net.Security._SslStream.ProcessWrite (Byte buffer [], Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest) \ r \ n in System.Net.Security.SslStream.Write (Byte [ ], offset Int32, number Int32) \ r \ n

+5

multithreading c # file io parallel.for

shortspider Jul 31 '15 at 17:37

source share

5 answers

If you are not writing on striped RAID, you are unlikely to experience performance benefits while writing a file from multiple streams. In fact, most likely, it will be the other way around: parallel recordings will alternate and cause random access, which will lead to delays in searching the disk, which makes them an order of magnitude slower than large sequential recordings.

To get an idea of the future, look at a comparison of delays . Serial 1 MB read from disk takes 20 ms; Writes takes about the same time. Each disk searches, on the other hand, takes about 10 ms. If your records alternate with 4 KB fragments, then your 1 MB record will require additional search time of 2560 ms, which makes it 100 times slower than sequential.

I suggest only allowing one thread to write to a file at any time and use parallelism only for network transfer. You can use the producer-consumer template, where the downloaded pieces are written to a limited parallel collection (for example, BlockingCollection<T> ), which are then received and written to disk by a dedicated stream.

+4

Douglas Jul 31 '15 at 17:50

source share

Here is my hunch from the information so far:

On Windows, when you write a position that extends the file size, Windows needs zero, initialize everything before that. This will prevent data leakage of the old disk, which will be a security issue.

Probably everyone except your first thread needs zero initialization so much data that load time. This doesn’t work anymore because the first recording takes a lot of time.

If you have the LPIM privilege, you can avoid zero initialization. Otherwise, you cannot for security reasons. The Free Download Manager displays a message that zero initialization has begun at the beginning of each download.

+1

usr Jul 31 '15 at 17:57

source share

  fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);

This call to Seek () is a problem, you will look for a part of the file that is very far from the current end of the file. Your next call to fileStream.Write () forces the file system to expand the file on disk, filling its unwritten parts with zeros.

This may take some time, your stream will be blocked until the file system expands the file. Perhaps long enough to trigger a timeout. You would immediately understand that this is not happening right at the beginning of the program.

The workaround is to create and fill out the entire file before starting to record real data. Otherwise, a very common strategy used by bootloaders, you may have seen .part files before. Another nice benefit is that you have a decent guarantee that the transfer cannot fail, because the disk has run out of free space. Remember that filling a file with zeros is cheap when the machine has enough RAM. 1 GB should not be a problem for modern machines.

Reprogram code:

 using System; using System.IO; using System.Diagnostics; class Program { static void Main(string[] args) { string path = @"c:\temp\test.bin"; var fs = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Write); fs.Seek(1024L * 1024 * 1024, SeekOrigin.Begin); var buf = new byte[4096]; var sw = Stopwatch.StartNew(); fs.Write(buf, 0, buf.Length); sw.Stop(); Console.WriteLine("Writing 4096 bytes took {0} milliseconds", sw.ElapsedMilliseconds); Console.ReadKey(); fs.Close(); File.Delete(path); } }

Output:

 Writing 4096 bytes took 1491 milliseconds

It was on a fast SSD, the spindle drive will take much longer.

+1

Hans passant Jul 31 '15 at 18:11

source share

System.Net.Sockets.NetworkStream.Write

The stack trace shows that errors occur while writing to the server. This is a timeout. It may be due to

network failure / congestion
not responding server.

This is not a problem with writing to a file. Analyze the network and server. The server may not be ready for simultaneous use.

Prove this theory by disabling file writing. The error must remain.

0

usr Jul 31 '15 at 19:49

source share

shortspider · Accepted Answer · 2015-08-05T18:54:35+0000

So, after trying all the suggestions, I ended up using MemoryMappedFile and opened a stream for writing to MemoryMappedFile for each stream:

 int numberOfStreams = 4; List<Tuple<int, int>> ranges = new List<Tuple<int, int>>(); string fileName = @"C:\MyCoolFile.txt"; //Ranges list populated here using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(fileName, FileMode.OpenOrCreate, null, fileSize.Value, MemoryMappedFileAccess.ReadWrite)) { Parallel.For(0, numberOfStreams, index => { try { HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL"); using(Stream responseStream = webRequest.GetResponse().GetResponseStream()) { using (MemoryMappedViewStream fileStream = mmf.CreateViewStream(ranges[index].Item1, ranges[index].Item2 - ranges[index].Item1 + 1, MemoryMappedFileAccess.Write)) { responseStream.CopyTo(fileStream); } }; } catch (Exception e) { exception = e; } }); }

Writing to a file with multiple C # streams

More articles: