Re: blocksize > 4K in ext2 ?

Theodore Y. Ts'o (tytso@mit.edu)
Thu, 21 May 1998 12:34:25 -0400


Large blocksizes are an attempt to try to make filesystems more
efficient in terms of speed, but have the downside that they decrease
the efficiency in terms of storage efficiency. That's why people then
want to add support for fragments.

Ext2fs already tries to solve this problem in a nicer way, which is to
try very hard to allocate files contiguously (by preallocation and other
means); this gives you the performance benefits of larger blocksizes
while still keeping the storage efficiency of using 1k blocksizes,
without needing the complication of using fragments.

In fact, if you look at the report given by e2fsck, for most systems I
expect you'll find results similar to mine --- most files are stored
contiguously on ext2 filesystems already, even on filesystems using 1k
blocksizes, and so there isn't much benefit to go to larger blocksizes
as there are on some other filesystems.

The only reason why 4k blocksizes are are somewhat faster than 1k
blocksizes is because of large files and indirect, doubly indirect, and
triply indirect blocks, which cause disk accesses to need to jump around
trying to do the lookups in the indirect blocks, assuming that they
aren't in the buffer cache already (which they generally are except for
the genuinely really large fiels). Using larger blocksizes minimizes
the number of indirect blocks that you need.

The better solution is to avoid using indirect blocks altogether; given
that most files are stored congiuously, simply storing in the inode a
note to the effect that data blocks begin at block# 1236 and go on for
1087 blocks would avoid the need for any indirect blocks at all.
However, for sparse files, this approach doesn't work well, since the
number of extents you would need to store would be larger than the the
space we have in the inode. Stephen Tweedie has suggested using a
B-tree structure to store the extent mappings, which would certainly
work well for those cases where the file is a large, sparse file, and so
the blocks are scattered all over the disk.

Once this feature is added, there's no need for using large block sizes
in ext2 filesystems, and hence no need for fragments.

- Ted

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu