Re: [RFC/PATCH 0/2] ext4: Transparent Decompression Support

From: Taras Glek
Date: Thu Jul 25 2013 - 12:51:17 EST




Dhaval Giani wrote:
On 07/24/2013 07:36 PM, JÃrn Engel wrote:
On Wed, 24 July 2013 17:03:53 -0400, Dhaval Giani wrote:
I am posting this series early in its development phase to solicit some
feedback.
At this state, a good description of the format would be nice.

Sure. The format is quite simple. There is a 20 byte header followed by an offset table giving us the offsets of 16k compressed zlib chunks (The 16k is the default number, it can be changed with the use of szip tool, the kernel should still decompress it as that data is in the header). I am not tied to the format. I used it as that is what being used here. My final goal is the have the filesystem agnostic of the compression format as long as it is seekable.


We are implementing transparent decompression with a focus on ext4. One
of the main usecases is that of Firefox on Android. Currently libxul.so
is compressed and it is loaded into memory by a custom linker on
demand. With the use of transparent decompression, we can make do
without the custom linker. More details (i.e. code) about the linker can
be found at https://github.com/glandium/faulty.lib
It is not quite clear what you want to achieve here.

To introduce transparent decompression. Let someone else do the compression for us, and supply decompressed data on demand (in this case a read call). Reduces the complexity which would otherwise have to be brought into the filesystem.
The main use for file compression for Firefox(it's useful on Linux desktop too) is to improve IO-throughput and reduce startup latency. In order for compression to be a net win an application should be aware of what is being compressed and what isn't. For example patterns for IO on large libraries (eg 30mb libxul.so) are well suited to compression, but SQLite databases are not. Similarly for our disk cache: images should not be compressed, but javascript should be. Footprint wins are useful on android, but it's the increased IO throughput on crappy storage devices that makes this most attractive.

In addition of being aware of which files should be compressed, Firefox is aware of patterns of usage of various files it could schedule compression at the most optimal time.

Above needs tie in nicely with the simplification of not implementing compression at fs-level.

One approach is
to create an empty file, chattr it to enable compression, then write
uncompressed data to it. Nothing in userspace will ever know the file
is compressed, unless you explicitly call lsattr.

If you want to follow some other approach where userspace has one
interface to write the compressed data to a file and some other
interface to read the file uncompressed, you are likely in a world of
pain.
Why? If it is going to only be a few applications who know the file is compressed, and read it to get decompressed data, why would it be painful? What about introducing a new flag, O_COMPR which tells the kernel, btw, we want this file to be decompressed if it can be. It can fallback to O_RDONLY or something like that? That gets rid of the chattr ugliness.
This transparent decompression idea is based on our experience with HFS+. Apple uses the fs-attribute approach. OSX is able to compress application libraries at installation-time, apps remain blissfully unaware but get an extra boost in startup perf.

So in Linux, the package manager could compress .so files, textual data files, etc.

Assuming you use the chattr approach, that pretty much comes down to
adding compression support to ext4. There have been old patches for
ext2 around that never got merged. Reading up on the problems
encountered by those patches might be instructive.

Do you have subjects for these? When I googled for ext4 compression, I found http://code.google.com/p/e4z/ which doesn't seem to exist, and checking in my LKML archives gives too many false positives.

Thanks!
Dhaval
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/