Why is * .tar.gz still much more common than * .tar.xz?

Whenever I see some source packages or binaries that are compressed using GZip, I wonder if there are more reasons to support gz through xz (excluding the time transition in 2000), saving the LZMA compression algorithm is significant, and decompressing the magnitude worse than gzip.

+53
compression zip decompression xz lzma
Jun 27 '11 at 13:05
source share
8 answers

"The lowest common denominator." Additional free space is rarely worth interoperability. Most Linux embedded systems have gzip, but not xz. A lot of the old system. Gnu Tar, which is an industry standard, supports the -z flags for processing via gzip, and -j for processing via bzip2 , but some older systems do not support -j for xz , i.e. it requires a two-step operation (and a lot of additional disk space for uncompressed .tar unless you use the syntax |tar xf - ), which many people are unaware of. Also, it takes about 2 minutes to decompress a complete file system of about 10 MB in size from tar.gz on the integrated ARM and is not really a problem. No hint about xz , and bzip2 takes about 10-15 minutes. with a saved strip.

In any case, the current “modern alternative” in which you sacrifice processor power in favor of disk space (... which is still rarely welcomed - bandwidth and disk space is cheap, and people hate when systems stop due to some updates work in the background) - this is bzip2.

+48
Jun 27 '11 at 13:25
source share
— -

The final answer is accessibility with the secondary response of the target. Reasons why XZ is not necessarily suitable for Gzip:

  • Embedded and legacy systems are more likely to not have enough available memory to unpack LZMA / LZMA2 archives, such as XZ. As an example, if XZ can save 400 KiB (versus Gzip) of a package designed for the OpenWrt router, what good is it in saving space if the router has 16 MB of RAM? A similar situation arises with very old computer systems. One might scoff at the thought of downloading and compiling the latest version of Bash on an old SparcStation LX with 32 MB of RAM, but it does.

  • Such systems usually have slow processors, and the decompression rise time can be very high. Three seconds for decompression on your Core i5 can be very long on a 200 MHz ARM core or 50 MHz microSPARC. Gzip compression works extremely fast on such processors compared to all the best compression methods such as XZ or even Bzip2.

  • Gzip is almost universally supported by every UNIX-like system (and almost every non-UNIX-like system) created in the last two decades. XZ availability is much limited. Compression is useless without the ability to unpack it.

  • Higher compression takes a lot of time. If compression time is more important than compression ratio, Gzip is superior to XZ. Honestly, lzop is much faster than Gzip, and still compresses everything in order, so applications that require the most compression possible and do not require the ubiquity of Gzip should look at that. I regularly shuffle folders over a trusted LAN connection with commands such as "tar -c * | lzop -1 | socat -u-tcp-connect: 192.168.0.101: 4444", and Gzip can be used similarly with a much slower link (t .e. do the same thing that I just described through the SSH tunnel over the Internet).

Now, on the other hand, there are situations when the XZ compression far exceeds:

  • Sending data via slow links. The source code for the Linux kernel 3.7 is 32 MB smaller in the XZ format than in the Gzip format. If you have a super fast connection, choosing XZ may mean saving one minute of boot time; on a cheap DSL or 3G cellular connection, it can save an hour or more on download time.

  • Reduce backup archives. Compressing the Apache httpd-2.4.2 source code using "gzip-9" and "xz -9e" gives the XZ archive, which is 62.7% of the size of the Gzip archive. If the same compressibility exists in the dataset that you currently store as 100 gigabytes .tar.gz archives, converting to .tar.xz archives would reduce the whopping 37.3 gigabytes from the backup set. Copying all of this backup dataset to a USB 2.0 hard drive (maximum transfer speed is 30 megabytes / sec), since Gzipped data will take 55 minutes, but XZ compression will make the backup 20 minutes less. Assuming that you will be working with these backups on a modern desktop system with a lot of processor power, and one-time compression speed is not a serious problem, using XZ compression usually makes more sense. Why drag extra data if you don’t need it?

  • The dissemination of large amounts of data that can be highly compressible. As already mentioned, the Linux source code is 3.7 - 67 MiB for .tar.xz and 101 MiB for .tar.gz; uncompressed source code is about 542 megabytes and almost entirely text. The source code (and the text as a whole) is generally highly compressed due to content redundancy, but compressors such as Gzip, which work with a much smaller dictionary, cannot use redundancy that goes beyond their dictionary size.

In the end, everything returns to a four-way compromise: compressed size, compression / decompression speed, copy / transfer speed (reading data from disk / network) and compressor / decompressor availability. The choice depends on the question "what do you plan to do with this data?"

Also check out this related post from which I learned some of the things that I am repeating here.

+58
Dec 15 '12 at 19:50
source share

I made my own Linux vmdk 1.1GB Linux installation benchmark:

 rar =260MB comp= 85s decomp= 5s 7z(p7z)=269MB comp= 98s decomp=15s tar.xz =288MB comp=400s decomp=30s tar.bz2=382MB comp= 91s decomp=70s tar.gz =421MB comp=181s decomp= 5s 

all compression levels to the maximum, Intel I7 processor 3740QM, 32 GB 1600 memory, RAM source and destination

I usually use rar or 7z to archive regular files such as documents.
and for archiving system files I use .tar.gz or .tar.xz with a movie file or tar with the -z or -J options along with -preserve to compress with tar and save permissions (also alternative to .tar.7z or .tar.rar can be used)

update: since tar preserves only normal permissions in any case, not ACLs, you can also use simple .7z plus backing up and restoring permissions and ACLs manually via getfacl and sefacl, which seems to be the best option for archiving files or backup copying system files, it will completely save permissions and ACLs, has a checksum, integrity check and encryption capabilities, the only drawback is that p7zip is not available everywhere

+10
Sep 18 '14 at 5:01
source share

Honestly, I just learn the .xz format from the training material. So I just used the git repository to run the test. git git: //git.free-electron.com/training-materials.git, and I also composed three training slides. The total size of the catalog is 91 M, with a mixture of text and binary data.

Here is my quick result. Maybe people still prefer tar.gz simply because they compress much faster? I personally even use simple tar when there are not many advantages in compression.

 [02:49:32]wujj@WuJJ-PC-Linux /tmp $ time tar czf test.tgz training-materials/ real 0m3.371s user 0m3.208s sys 0m0.128s [02:49:46]wujj@WuJJ-PC-Linux /tmp $ time tar cJf test.txz training-materials/ real 0m34.557s user 0m33.930s sys 0m0.372s [02:50:31]wujj@WuJJ-PC-Linux /tmp $ time tar cf test.tar training-materials/ real 0m0.117s user 0m0.020s sys 0m0.092s [02:51:03]wujj@WuJJ-PC-Linux /tmp $ ll test* -rw-rw-r-- 1 wujj wujj 91944960 2012-07-09 02:51 test.tar -rw-rw-r-- 1 wujj wujj 69042586 2012-07-09 02:49 test.tgz -rw-rw-r-- 1 wujj wujj 60609224 2012-07-09 02:50 test.txz [02:56:03]wujj@WuJJ-PC-Linux /tmp $ time tar xzf test.tgz real 0m0.719s user 0m0.536s sys 0m0.144s [02:56:24]wujj@WuJJ-PC-Linux /tmp $ time tar xf test.tar real 0m0.189s user 0m0.004s sys 0m0.108s [02:56:33]wujj@WuJJ-PC-Linux /tmp $ time tar xJf test.txz real 0m3.116s user 0m2.612s sys 0m0.184s 
+9
Jul 09 2018-12-12T00:
source share

From the author of the Lzip compression utility:

Xz has a complex format, partially specialized in compressing executable files and intended for extension to format formats. Of the four compressors tested here, xz is the only Unix alien concept to "do one thing and do it well." This is all the less suitable for data sharing, and generally not suitable for long-term archiving.

In general, the more complex the format, the less likely it is to be decrypted in the future. But the xz format, like its infamous predecessor lzma-alone, is specially poorly designed. Xz copies almost all gzip defects, and then adds a few more, like fragile integers of variable length. Only one bit-flip to bit 7 of any byte is one integer variable and the whole stream xz collapses like a house of cards. Using xz for anything other than compressing short-lived executable files is impractical.

Do not interpret me wrong. I am very grateful to Igor Pavlov for inventing / opening LZMA, but xz is the third attempt of his followers to take advantage of the popularity of 7zip and replace gzip and bzip2 with inappropriate or poorly designed formats. In particular, it is shameful that lzma-alone support was implemented in both GNU and Linux.

http://www.nongnu.org/lzip/lzip_benchmark.html

+7
Feb 24 '16 at 11:20
source share

For the same reason, people on Windows (r) use zip files instead of 7zip, and some still use rar instead of other formats ... Or mp3 is used in music, not aac +, etc.

Each format has advantages, and people use it to stick to the solution they learned when they started using the computer. Add this to backward compatibility and fast bandwidth + GB or TB of hard disk space, and the benefits of higher compression will not be relevant.

+3
Jun 27 '11 at 13:45
source share

gz is supported everywhere and is good for portability.

xz is newer and is now widely or well supported. It is more complex than gzip with a lot of compression options.

This is not the only reason people cannot always use xz. xz can take a very long time to compress, rather than a trivial amount of time, so even if it can produce excellent results, it cannot always be chosen. Another weakness is that it can use a lot of memory, especially for compression. The more you want to compress the element, the longer it will take, and it will be exponentially with decreasing returns.

However, with compression level 1 for blobs in my experience, xz can often give much smaller results in less time than zlib at level 9. This can sometimes be a very significant difference, while zlib can be created by xz, xz can create a file half the size of the zlib file.

bzip2 is in a similar situation, but xz has much higher advantages and a strong window, where it works much better than everyone else.

+3
Nov 02 '15 at 12:42
source share

Also important for gzip is that it is compatible with rsync / zsync . This can be a huge advantage in terms of bandwidth in cases. LZMA / bzip2 / xz does not support rsync and probably will not support it any time soon.
One of the characteristics of LZMA is that it uses a quiet large window. To make it rsync / zsync convenient, we probably need to reduce this window, which will degrade its compression performance.

+1
Jan 05 '15 at 2:06
source share



All Articles