While it may be tempting to tweak a specific algorithm for your problem, it is likely to take an unreasonable amount of time and effort, while standard compression methods will immediately give you a big boost to solve the memory consumption problem.
The “standard” way to deal with this problem is to cut the source data into small blocks (for example, 256 KB) and compress them individually. When accessing data in a block, you must first decode it. Therefore, the optimal block size really depends on your application, that is, the more application flows, the more blocks; on the other hand, the more arbitrary the access pattern, the smaller the block size.
If you are concerned about the compression / decompression speed, use a high-speed algorithm. If decompression speed is the most important metric (for access time), then, like LZ4, you get about 1 GB / s of decoding performance per core, so this gives you an idea of how many blocks per second you can decode.
If only decompression speed is important, you can use the high compression option LZ4-HC, which will further increase the compression ratio by about 30% and also improve the decompression speed.
source share