How to use multiple threads to compress zlib (same input source)

Question

How to use multiple threads to compress zlib (same input source)

My goal is to compress single source data in parallel streams. I defined the tasks that are in the list, these tasks have information to read (500 kb-1 MB in each task).

My compressor threads compress each block of data using the ZLIB and store it in the outbuf of the respective jobs.

Now I want to combine all this and create one output file that has the standard ZLIB format.

From ZLIB RFC and after looking at the source of pigzee, I understand that

ZLIB header is similar to below

     +---+---+
     |CMF|FLG| (2 bytes)
     +---+---+
     +---+---+---+---+
     |     DICTID    | (4 bytes. Present only when FLG.FDICT is set)
     +---+---+---+---+
     +=====================+
     |...compressed data...| (variable size of data)
     +=====================+
     +---+---+---+---+
     |     ADLER32   |  (4 bytes of variable data)
     +---+---+---+---+

In my case there is no dictionary.

Therefore, when I combine two compressed blocks, the title of all blocks is the same.

Therefore, I do the following.

For the first block, I write the header + compressed data.
( )
adlrer32_combine() adler 32, .

, , .

- - ? .

+4

multithreading linux compression zlib

mk.. 12 . '15 1:37

1

Mark Adler · Answer 1 · 2015-06-12T08:09:44+0000

. , .

pigz , . Z_SYNC_FLUSH . , . , . n-1 1 .

How to use multiple threads to compress zlib (same input source)

More articles: