Re: [kernel.org users] XZ Migration discussion

From: Justin P. Mattock
Date: Sun Feb 14 2010 - 04:34:19 EST


On 02/14/10 01:23, Jean Delvare wrote:
On Sat, 13 Feb 2010 23:52:17 +0000, Phillip Lougher wrote:
Jean Delvare wrote:


Compared to bz2, gz saves... 2% on the overall time. As a conclusion, I
think we can plain discard the argument "I need .gz because my machine
is slow" from now on. It simply doesn't hold.


I agree, but, IMHO the main argument for keeping .gz is cross-platform
availability and wide language support, not hardware limitations. Doing
a quick google brings up .gz interfaces for every language you can think
of (C, Java, Perl, Python, TCL etc.), not to mention complete separate
implementations in Java and Pascal (not just wrappers on top of the zlib
library), and probably more.

With xz you have just one C/C++ implementation with a single library with
an undocumented API for C/C++ programmers.

This can probably be easily explained. gz is very fast decompressing so
it is a very good choice for transparent decompression of files which
must be accessible fast but aren't used frequently. Manual pages or
printer drivers come to mind. bz2 and lzma, OTOH, are meant for longer
term archiving. Their compression ratio benefit is only worth it for
larger files that you don't access that frequently.

I am not claiming that gzip is dead. It is very useful and it is there
to stay for the years to come, no doubt about that. What I'm saying is
that it isn't the best choice for large files to be downloaded from a
remote server.

It may be a slight stretch of the imagination, but with with .gz you can
conceive programmers writing programs to download a .gz from kernel.org and
decompressing/searching it, in almost any language of choice. With the JAVA
implementation .gz is genuinely cross platform and you don't need glibc/
C++ compilers, just a Java VM. Contrast with xz, where if the xz utility
isn't available, or doesn't do what you want, you're stuck with programming
in C/C++ with all the baggage that entails.

Honestly, I don't think we care at all when it comes to the kernel.org
files. Accessing individual files inside a compressed kernel tarball
without first expanding it entirely would be horribly slow and
unpractical, no matter which compression format was used. I can't think
of any case where you won't unpack the tarball first, and for this task
an external tool will do just fine.

And, once again, there are several public instances of gitweb and LXR
available if you only want to browse the code.


just out of curiosity what would happen if by say
I take a file and turn it into .gz then turn the .gz
into .xz or vice versa?

so at the end of the day you have a list of .gz's(or whatever),
then expending on the type(.gz,.bz2,etc..) unpackage and voila either a tree or some other compressed file(.bz2,xz, or .gz).

just thinking out loud(so don't shoot me please).

Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/