Re: [RFC] [PATCH 5/8] inode-diet: Eliminate i_blksize and use a per-superblock default

From: Nate Diller
Date: Wed Jun 21 2006 - 15:39:58 EST


On 6/19/06, Theodore Tso <tytso@xxxxxxx> wrote:
On Mon, Jun 19, 2006 at 09:16:51AM -0700, Joel Becker wrote:
> On Mon, Jun 19, 2006 at 04:58:21PM +0100, Christoph Hellwig wrote:
> > Blease don't add a field to the superblock for the optimal I/O size.
> > Any filesystem that wants to override it can do so directly in ->getattr.
>
> I don't disagree with you, but the idea of everyone implementing
> ->getattr where they just let it work before scares me. It's a ton of
> cut-n-paste error waiting to happen. Especially if we make something
> stale.
> Perhaps add generic_fillattr_blksize()?

Well, as far as I know the only filesystems today that would need to
do something different are xfs, ocfs2, and reiserfs, and IMHO only the
first two have any kind of justification for doing it. Part of the
problem is what st_blksize actually means was never well-defined; it
was never in POSIX, and in SuSv3 all that is stated is, "A file
system-specific preferred I/O block size for this object." This is
why Reiserfs got away with specifying 128 megs (I assume it helped on
some benchmark), and why being ill-defined, using such a large value
might cause some applications (like /bin/cp) to core dump.

Given that most filesystems use the generic page cache read/write
functions, using PAGE_CACHE_SIZE as the default seems to make a huge
amount of sense. I really wonder how useful setting st_blksize really
is, actually, at least in the real-world, as opposed to just for
benchmarks.

I assume "128 megs" is a typo, it's 128k, of course. And certainly it
would have helped speed things up, not just benchmarks, because for
modern disks, doing 128k vs 4k takes like 25% more time. Wu's
adaptive readahead patches might make this outdated, though, since
they now support "read 10 pages, seek, read 10 pages, seek, etc" type
workloads.

NATE
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/