This is a really interesting question. Compression is strongly associated with the processor, relying on many searches and comparisons. Therefore, it is very important to want to parallelize it when you have several processors with unhindered access to memory.
There is a class in the DotNetZip library called ParallelDeflateOutputStream that does what you describe. The class is registered here .
It can only be used for compression - without decompression. Also, this is strictly an output stream - you cannot read compress. Given these limitations, it is mainly a DeflateOutputStream, which internally uses multiple streams.
How it works: it splits the incoming stream into pieces, then discards each piece into a separate workflow, which must be compressed separately. Then it combines all these compressed streams back into one ordered stream at the end.
Assume that the size of the “chunk” supported by the stream is N bytes. When the caller calls Write (), the data is buffered into a bucket or chunk. Inside the Stream.Write() method, when the first “bucket” is full, it calls ThreadPool.QueueUserWorkItem , highlighting the bucket on the work item. Subsequent writes to the stream begin to fill the next bucket, and when it is Stream.Write() , Stream.Write() calls QUWI again. Each worker thread compresses its bucket using Sync “Flash Type” (see Deflation Specification), and then notes that its compressed block is ready to exit. Then these various outputs are reordered (because piece n is not necessarily compressed to fragment n + 1) and written to the output stream. As each bucket is written, it is marked as empty, ready to be Stream.Write() with the next Stream.Write() . Each piece must be compressed using the Sync synchronization type to allow them to be re-combined through simple concatenation, since the combined stream is a legitimate DEFLATE stream. The final piece requires type Flush = Finish.
The design of this thread means that callers do not need to write with multiple threads. Subscribers simply create a stream as usual, for example, the vanilla DeflateStream used for output, and write to it. The stream object uses multiple streams, but your code does not directly interact with them. The code for the "user" ParallelDeflateOutputStream as follows:
using (FileStream raw = new FileStream(CompressedFile, FileMode.Create)) { using (FileStream input = File.OpenRead(FileToCompress)) { using (var compressor = new Ionic.Zlib.ParallelDeflateOutputStream(raw)) {
It was developed for use in the DotNetZip ZipFile class, but is perfectly applicable as a standalone compression output stream. The resulting stream can be unlocked (bloated?) With any blower. The result is fully consistent with the specification.
The stream is configured. You can set the size of the buffers used and the level of parallelism. It does not create buckets without restrictions, because for large threads (gb scale, etc.), which cause a lack of memory. Thus, there is a fixed limit to the number of buckets and, therefore, the degree of parallelism that can be supported.
On my dual-core machine, this stream class almost doubled the compression speed of large (100 MB or more) files compared to the standard DeflateStream. I do not have large multi-core machines, so I could not check it further. The trade-off is that the parallel implementation uses more CPU and more memory, and also compresses a little less efficiently (1% less for large files) due to the synchronization structure described above. The performance benefit will depend on the I / O bandwidth in your output stream and whether the storage can support the stream of parallel compressors.
Protest:
This is a DEFLATE stream, not a gzip. For differences, read RFC 1951 (DEFLATE) and RFC 1952 (GZIP) .
But if you really need gzip, the source for this stream is available, so you can browse it and maybe get some ideas for yourself. GZIP is actually just a shell on top of DEFLATE, with some additional metadata (for example, Adler checksum, etc. - see Specification). It seems to me that building a ParallelGzipOutputStream not very difficult, but it may not be trivial either.
The hardest part for me was the correct semantics of Flush () and Close ().
EDIT
Just for fun, I created ParallelGZipOutputStream, which basically does what I described above for GZip. It uses .NET 4.0 tasks instead of QUWI to handle parallel compression. I tested it just now on a 100 MB text file generated using the Markov Chain mechanism. I compared the results of this class with some other options. Here's what it looks like:
uncompressed: 104857600 running 2 cycles, 6 Flavors System.IO.Compression.GZipStream: .NET 2.0 builtin compressed: 47550941 ratio : 54.65% Elapsed : 19.22s ICSharpCode.SharpZipLib.GZip.GZipOutputStream: 0.86.0.518 compressed: 37894303 ratio : 63.86% Elapsed : 36.43s Ionic.Zlib.GZipStream: DotNetZip v1.9.1.5, CompLevel=Default compressed: 37896198 ratio : 63.86% Elapsed : 39.12s Ionic.Zlib.GZipStream: DotNetZip v1.9.1.5, CompLevel=BestSpeed compressed: 47204891 ratio : 54.98% Elapsed : 15.19s Ionic.Exploration.ParallelGZipOutputStream: DotNetZip v1.9.1.5, CompLevel=Default compressed: 39524723 ratio : 62.31% Elapsed : 20.98s Ionic.Exploration.ParallelGZipOutputStream:DotNetZip v1.9.1.5, CompLevel=BestSpeed compressed: 47937903 ratio : 54.28% Elapsed : 9.42s
Conclusions:
GZipStream, built into .NET, is pretty fast. It is also not very efficient, and it is not configurable.
"BestSpeed" on the vanilla (non-parallel) GZipStream in DotNetZip is about 20% faster than the built-in .NET stream, and gives about the same compression.
Using multiple compression tasks can reduce the time spent on my dual-core laptop (3 GB RAM) by about 45% by comparing the vanilla DotNetZip GZipStream with the parallel one. I believe that the time savings will be higher for machines with a large number of cores.
The cost of parallel GZIP cropping increases the size of the compressed file by about 4%. This will not change with the number of cores used.
The resulting .gz file can be unpacked with any GZIP tool.