Performance issues in FileStream.Write when writing bytes decoded using AsciiEncoding.GetBytes and Convert.FromBase64String

I am facing a performance issue when using the FileStream.Write function.

I have a console application that I use to read a Base64 string from a file (400 KB in size) using a StreamReader object. I convert this string to an array of bytes using Convert.FromBase64String. Then I write this byte array to a file using a FileStream object. The resulting byte array length was 334991.

I measured the time taken to write an array of bytes, and it takes about 0.116 seconds.

Just for fun, I got an array of bytes from the same Base64 encoded string using the ASCIIEncoding.GetBytes function (although I knew that this would not give the correct DECODED result - I just wanted to try). I wrote this byte array to a file using a FileStream object. The resulting byte array length was 458414.

I measured the time taken to write an array of bytes using this methodology, and it is approximately 0.008 seconds.

Here is a sample code:

class Program { static void Main(string[] args) { Stopwatch stopWatch = new Stopwatch(); TimeSpan executionTime; StreamReader sr = new StreamReader("foo.txt"); string sampleString = sr.ReadToEnd(); sr.Close(); ////1. Convert to bytes using Base64 Decoder (The real output!) //byte[] binaryData = Convert.FromBase64String(sampleString); //2. Convert to bytes using AsciiEncoding (Just for Fun!) byte[] binaryData = new System.Text.ASCIIEncoding().GetBytes(sampleString); Console.WriteLine("Byte Length: " + binaryData.Length); stopWatch.Start(); FileStream fs = new FileStream("bar.txt", FileMode.Create, FileAccess.Write); fs.Write(binaryData, 0, binaryData.Length); fs.Flush(); fs.Close(); stopWatch.Stop(); executionTime = stopWatch.Elapsed; Console.WriteLine("FileStream Write - Total Execution Time: " + executionTime.TotalSeconds.ToString()); Console.Read(); } } 

I ran tests for about 5,000 files containing Base64 encoding, and the difference between the time it takes to write these two types of byte array is almost 10 times (with the one that writes the byte array using real , taking more time).

The length of the byte array obtained using Convert.FromBase64String is less than the length obtained using the ASCIIEncoding.GetBytes function.

Interestingly, all I'm trying to do is write a bunch of bytes using a FileStream object. So, why should there be such a sharp difference in performance (wrt time) when writing an array of bytes to disk?

Or am I doing something terribly wrong? Please advise.

+4
source share
4 answers

For starters, DateTime has a low resolution (iirc 0.018 s). Therefore, it is better to use a stopwatch class.

Now this does not fully explain the difference, but you can get some better numbers.

+1
source

I gave some advice on another issue, check out these tools and links from MS Research.

They will help you troubleshoot any potential I / O problems, or at least understand them.

In addition, you should be aware of the problems around the CLR BIG subject heap . In particular, when using an array (anything over ~ 80 kb has suboptimal controlled interactions with the heap if you run it 5,000 times in the same process).

However, really, looking again, I do not think this is so closely related to your lemma. I ran this code in the profiler and it just shows that Convert.Base64 consumes all your loops.

Some other things in your test code, you should always run your test 2 times in a row, the jitter will have the ability to load at runtime. This can cause such a change during execution to be terrific. Now I think that you need to overestimate the test harness, trying to take into account the jitter and possible effects of a bunch of large objects. (put one of these routines in front of the other ...).

+1
source

I think the main problem in your code is that you are trying to compare cabbage with carrots (French expression):

Convert.FromBase64String and ASCIIEncoding (). GetBytes do not do the same at all.

Just try using any text file as input to your program, and it won’t work on FromBase64 as long as it works fine with ASCIIEncoding.

Now for an explanation of performance:

  • ASCIIEncoding (). GetBytes just takes one character from your file and converts it to bytes (which is pretty straight forward: there is nothing to do). For example, it converts β€œA” to 0x41 and β€œZ” to 0x5A ...

  • For Convert.FromBase64String, this is another story. This is really a translation of the "base64 encoded string" into an array of bytes. A base64 string is, for example, a "printed representation of binary data." Better, this is a β€œtextual” representation of binary data, which allows, for example, to transmit over the Internet wire. Images in mailboxes are base64 encoded because email protocols are text based. Thus, the process of converting back / forth base64 to bytes is not straightforward, so performance has been a success.

Fyi, the base64 line should look something like this:

SABlAGwAbABvAHcAIABXAG8AcgBsAGQAIQA =

which translates to "Hello World!" not right, right?

The following is information about the base64 format: http://en.wikipedia.org/wiki/Base64

Hope this helps

+1
source

You might want to take a look at a number of articles (and a source accompanying project) that John Skeet recently wrote in an issue.

here and here

In particular, he compared buffering and streaming, but there were also interesting results with different file sizes and number of streams.

0
source

All Articles