Re: Is there a "make hole" (truncate in middle) syscall?

From: Andy Isaacson
Date: Thu Dec 11 2003 - 13:59:06 EST


On Wed, Dec 10, 2003 at 09:13:49PM -0800, Hua Zhong wrote:
> This would be a tremendous enhancement to Linux filesystems, and one of
> my current projects actually needs this capability badly.
>
> The project is a lightweight user-space library which implements a
> file-based database. Each database has several files. The files are all
> block-based, and each block is always a multiple of 512 byte (and we
> could make it a multiple of 4K, in case this feature existed).
>
> Blocks are organized as a B+ tree, so we have a root block, which points
> to its child blocks, and in turn they point to the next level. There is
> a free block list too.
>
> The problem is with a lot of add/delete, there are a lot of free blocks
> inside the file. So essentially we'd have to manually shrink these files
> when it grows too big and eats up too much space. If we could just "dig
> a hole", it would be trivial to return those blocks to the filesystem
> without doing an expensive defragmentation.

The abstract interface for make_hole() is simple, but it turns into a
pretty expensive filesystem operation, I think. After many cycles of
free/allocate, your file would be badly fragmented across the
filesystem. You'll probably get better overall performance by keeping
track of how "sparse" your file is (you could compare st_blocks versus
how many blocks you have allocated in your tree structure) and re-write
it when you're wasting more than, say, 20% of the allocated space.

It turns into an interesting problem if you don't want to double your
space requirements during the re-write process. You could write the
new file "backwards", one MB at a time, truncating the previous file at
each step to free up the blocks. You'd end up with contiguous 1MB
chunks, which given your tree organization is probably good enough. If
you wanted really good streaming performance you'd want to do bigger
chunks (or just write the file from the beginning, or use the
pre-allocation APIs that I think XFS provides).

-andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/