I need help converting the VERY LARGE binary file (ZIP file) to Base64String and vice versa. Files are too large to load into memory immediately (they call OutOfMemoryExceptions), otherwise it would be a simple task. I do not want to process the contents of the ZIP file separately, I want to process the entire ZIP file.
Problem:
I can convert the whole ZIP file (test sizes vary from 1 MB to 800 MB at present) to Base64String, but when I convert it, it is damaged. The new ZIP file is the right size, it is recognized as a Windows ZIP file and WinRAR / 7-Zip, etc. And I can even look into the ZIP file and see the contents with the correct sizes / properties, but when I try to extract from the ZIP file I get: "Error: 0x80004005", which is a common error code.
I am not sure where and why corruption occurs. I did some investigation and I noticed the following:
If you have a large text file, you can easily convert it to Base64String. If calling Convert.ToBase64String throughout the file gave: "abcdefghijklmnopqrstuvwx" , then calling it in a file in two parts will give: "abcdefghijkl" and "mnopqrstuvwx" .
Unfortunately, if the file is binary, then the result is different. While the whole file might give: "abcdefghijklmnopqrstuvwx" , trying to process this in two parts will give something like: "oiweh87yakgb" and "kyckshfguywp" .
Is there a way for incremental base 64 to encode the binary avoiding this damage?
My code is:
private void ConvertLargeFile() { FileStream inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read); byte[] buffer = new byte[MultipleOfThree]; int bytesRead = inputStream.Read(buffer, 0, buffer.Length); while(bytesRead > 0) { byte[] secondaryBuffer = new byte[buffer.Length]; int secondaryBufferBytesRead = bytesRead; Array.Copy(buffer, secondaryBuffer, buffer.Length); bool isFinalChunk = false; Array.Clear(buffer, 0, buffer.Length); bytesRead = inputStream.Read(buffer, 0, buffer.Length); if(bytesRead == 0) { isFinalChunk = true; buffer = new byte[secondaryBufferBytesRead]; Array.Copy(secondaryBuffer, buffer, buffer.length); } String base64String = Convert.ToBase64String(isFinalChunk ? buffer : secondaryBuffer); File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String); } inputStream.Dispose(); }
Decryption is more similar. I am using the size of the above base64String variable (which varies depending on the original size of the buffer I'm testing with) as the size of the buffer to decode. Then, instead of Convert.ToBase64String() I call Convert.FromBase64String() and write to a different file name / path.
EDIT:
In my rush to reduce code (I reorganized it into a new project, separate from other processing, to exclude code that is not central to this problem). I entered an error. Base 64 conversion must be performed on secondaryBuffer for all iterations that retain the latter (identified by isFinalChunk ) when buffer should be used. I have adjusted the code above.
EDIT No. 2:
Thank you all for your comments / feedback. After fixing the error (see edit above), I re-checked my code and it actually works now. I intend to test and implement the @rene solution, as it seems to be the best, but I thought that I should know everything about my discovery.