Re: Is there a "make hole" (truncate in middle) syscall?

From: Jörn Engel
Date: Fri Dec 12 2003 - 07:28:23 EST


On Thu, 11 December 2003 13:58:54 -0600, Andy Isaacson wrote:
> On Thu, Dec 11, 2003 at 08:48:15PM +0100, Jörn Engel wrote:
> >
> > If you really do it, please don't add a syscall for it. Simply check
> > each written page if it is completely filled with zero. (This will be
> > a very quick check for most pages, as they will contain something
> > nonzero in the first couple of words)
>
> Um, no.
>
> That is a very bad idea. Your suggestion would make it impossible to
> actually write a block of all-zeros to the disk. That makes it
> impossible to pre-allocate disk space.

How about implementing it inside the individual filesystems? Then
each filesystem can figure out a logic that suits it's special needs.

What I would sometimes like to have goes even beyond this. Create a
simple hash for each size-x chunk of a file, and check against those
hashes whenever writing. If hashes are identical, compare the chunks
and if those are identical as well, just create another link to that
chunk. Kinda like rsync on a filesystem level, only without the
rolling checksum thing.

Yes, you can do a lot of this from userspace, but hard links don't
have a copy-on-write semantic, so this often breaks things, unless
*all* userspace programs break hard links before modifying files.

Also, this effectively compresses your data, so you need less
bandwidth to the cache and less cache size. Whatever applies to code
size and L1 cache should apply here as well. Sure, the disk access
pattern may be worse, but who cares, if data sets suddenly fit into
memory?

Oh yes, this would also give you my proposed zero-block detection for
free.

> Another syscall is precisely the correct thing to do. (I don't think
> make_hole() is a special case of any extant syscall.)

Depends. My proposal has a bunch of problems. We won't have it
implemented by next year. I buy all that. Maybe we can do it with
10% kernel code and 90% userspace, maybe not. Most likely the first
couple of implementations create more problems than they solve, yes.

But we should get there some day. Having 15 nearly identical copies
of the kernel on my notebook is a pain and hard links simply have the
wrong semantics.

Jörn

--
Anything that can go wrong, will.
-- Finagle's Law
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/