Re: [00/17] Large Blocksize Support V3

From: Andrew Morton
Date: Fri Apr 27 2007 - 01:16:46 EST


On Fri, 27 Apr 2007 14:20:46 +1000 David Chinner <dgc@xxxxxxx> wrote:

> > blocksizes via this scheme - instantiate and lock four pages and go for
> > it.
>
> So now how do you get block aligned writeback?

in writeback and pageout:

if (page->index & mapping->block_size_mask)
continue;

> Or make sure that truncate
> doesn't race on a partial *block* truncate?

lock four pages

> You basically have to
> jump through nasty, nasty hoops, to handle corner cases that are introduced
> because the generic code can no longer reliably lock out access to a
> filesystem block.
>
> Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and
> doing everything inside the filesystem because it's the only way sane
> way to serialise access to these aggregated structures. This is
> the way XFS used to work in it's data path, and we all know how long
> and loud people complained about that.....
>
> A filesystem specific aggregation mechanism is not a palatable solution
> here because it drives filesystems away from being able to use generic
> code.

I would expect we could (should) implement this in generic code by
modifying the existing stuff.

I'm not saying it's especially simple, nor fast. But it has the advantage
that we're not forced to use larger pages with _it's_ attendant performance
problems.

And it will benefit all filesystems immediately.

And it doesn't introduce a rather nasty hack of pretending (in some places)
that pages are larger than they really are.

And it has the very significant advantage that it doesn't introduce brand
new concepts and some complexity into core MM.

And make no mistake: the latter disadvantage is huge. Because if we do the
PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*.
Maintaining and enhancing core MM and VFS becomes harder and more costly
and slower and more buggy *for ever*. The ramp for people to become
competent on core MM becomes longer. Our developer pool becomes smaller, and
proportionally less skilled.

And hardware gets better. If Intel & AMD come out with a 16k pagesize
option in a couple of years we'll look pretty dumb. If the problems which
you're presently having with that controller get sorted out in the next
generation of the hardware, we'll also look pretty dumb.

As always, there are tradeoffs. We can see the cons, and they are very
significant. We don't yet know the pros. Perhaps they will be similarly
significant. But I don't believe that the larger PAGE_CACHE_SIZE hack
(sorry) is the only way in which they can be realised.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/