I’ve known about the LZMA compression algorithm for a little while, but I haven’t really played with it. So, giving it a quick try, I thought I would sick it after all the text files in my /etc directory. I’m using GNU tar to archive the files, and the maximum compression possible with each algorithm to get the tightest squeeze on that archive:
$ sudo tar -cf etc.tar /etc [sudo] password for aaron: tar: Removing leading `/' from member names $ time gzip -c9 etc.tar > etc.tar.gz gzip -c9 etc.tar > etc.tar.gz 8.01s user 0.04s system 100% cpu 8.048 total $ time bzip2 -c9 etc.tar > etc.tar.bz2 bzip2 -c9 etc.tar > etc.tar.bz2 8.12s user 0.04s system 99% cpu 8.170 total $ time lzma -c9 etc.tar > etc.tar.lzma lzma -c9 etc.tar > etc.tar.lzma 36.67s user 0.38s system 99% cpu 37.055 total $ ls -lh etc.tar* -rw-r--r-- 1 aaron aaron 37M 2008-12-14 13:52 etc.tar -rw-r--r-- 1 aaron aaron 2.8M 2008-12-14 13:49 etc.tar.bz2 -rw-r--r-- 1 aaron aaron 4.0M 2008-12-14 13:47 etc.tar.gz -rw-r--r-- 1 aaron aaron 1.5M 2008-12-14 13:50 etc.tar.lzma
As you can clearly see, when cranking up the compression on the TAR file, BZIP2 is comparable to GZIP. However, LZMA takes nearly 5 times as long to complete. However, the space saved from this time is significant- 1.5 MB versus 4 MB coming from GZIP. I’m not convinced 100%, though. Let’s sick it after some binary data. I have another TAR file, but this time with JPEGs and AVIs from my camera. Let’s see the results here (emphasis mine):
$ cd /media/NIKON/DCIM/103NIKON/ $ tar -cf ~/pics.tar * $ cd $ time gzip -c9 pics.tar > pics.tar.gz gzip -c9 pics.tar > pics.tar.gz 7.18s user 0.22s system 85% cpu 8.690 total $ time bzip2 -c9 pics.tar > pics.tar.bz2 bzip2 -c9 pics.tar > pics.tar.bz2 25.44s user 0.31s system 99% cpu 25.841 total $ time lzma -c9 pics.tar > pics.tar.lzma lzma -c9 pics.tar > pics.tar.lzma 68.49s user 0.82s system 99% cpu 1:09.46 total $ ls -lh pics.tar* -rw-r--r-- 1 aaron aaron 111M 2008-12-14 14:09 pics.tar -rw-r--r-- 1 aaron aaron 108M 2008-12-14 14:05 pics.tar.bz2 -rw-r--r-- 1 aaron aaron 110M 2008-12-14 14:04 pics.tar.gz -rw-r--r-- 1 aaron aaron 109M 2008-12-14 14:07 pics.tar.lzma
Yeah… LZMA isn’t giving me a lot here. In fact, I find it interesting that BZIP2 won in terms of the smallest size. Now, granted, I’m already aware that JPEG and AVI files are initially compressed, so I’m not looking to gain a lot here. As already mentioned, this is mostly a quest of curiosity. Again, notice the times- over a minute to complete with LZMA, where GZIP only took 8 seconds. However, let’s see what this would do on a file of nothing but binary zeros. Pulling from /dev/zero, I can create a file of any arbitrary size. So, let’s create a 512 MB file, and sick the compression algorithms after it:
$ dd if=/dev/zero of=file.zero bs=512M count=1 1+0 records in 1+0 records out 536870912 bytes (537 MB) copied, 12.4654 s, 43.1 MB/s $ time gzip -c9 file.zero > file.zero.gz gzip -c9 file.zero > file.zero.gz 4.86s user 0.18s system 99% cpu 5.052 total $ time bzip2 -c9 file.zero > file.zero.bz2 bzip2 -c9 file.zero > file.zero.bz2 11.35s user 0.24s system 100% cpu 11.586 total $ time lzma -c9 file.zero > file.zero.lzma lzma -c9 file.zero > file.zero.lzma 189.81s user 0.92s system 99% cpu 3:10.73 total $ ls -lh file.zero* -rw-r--r-- 1 aaron aaron 512M 2008-12-14 14:14 file.zero -rw-r--r-- 1 aaron aaron 402 2008-12-14 14:23 file.zero.bz2 -rw-r--r-- 1 aaron aaron 509K 2008-12-14 14:23 file.zero.gz -rw-r--r-- 1 aaron aaron 75K 2008-12-14 14:27 file.zero.lzma
Heh. All I can say, is heh. BZIP2 again took the top prize for being the most compressed, getting 512 MB into a mere 402 bytes. And it only took 6 extra seconds compared to GZIP. LZMA, while compressing fairly well, did miserably in the reported time. Three minutes?! On binary zeros?! What was it doing? Watching some YouTube while doing the compression?
All in all, I’m not impressed with LZMA. It’s a horrible performer, and only gives marginal results. It seems to do well on ASCII text, but fails miserably on binary files, where BZIP takes the clear win in compression. While it may pull out some impressive compression, the time it takes to perform isn’t worth it. BZIP2, is a much more capable algorithm, and although it’s a horrible performer too, it’s not nearly as bad as LZMA. I would make it worth my while to use BZIP2, whenever possible, reaching for GZIP only with time is the primary factor.
I would be interested in some other benchmarks on different data, if anyone has access to those. I think these results give us a good idea about LZMA though- STEER CLEAR.