Re: [patch 10/21] buffer heads: Support slab defrag

From: Andrew Morton
Date: Wed May 21 2008 - 02:42:22 EST


On Wed, 21 May 2008 10:15:32 +0400 Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:

> On Tue, May 20, 2008 at 04:28:16PM -0700, Andrew Morton (akpm@xxxxxxxxxxxxxxxxxxxx) wrote:
> > It's more than efficiency. There are lots and lots of things we cannot
> > do in direct-reclaim context.
> >
> > a) Can't lock pages (well we kinda sorta could, but generally code
> > will just trylock)
> >
> > b) Cannot rely on the inode or the address_space being present in
> > memory after we have unlocked the page.
> >
> > c) Cannot run iput(). Or at least, we couldn't five or six years
> > ago. afaik nobody has investigated whether the situation is now
> > better or worse.
> >
> > d) lots of deadlock scenarios - need to test __GFP_FS basically everywhere
> > in which you share code with normal writeback paths.
> >
> > Plus e), f), g) and h). Direct-reclaim is a hostile environment.
> > Things like b) are a real killer - nasty, subtle, rare,
> > memory-pressure-dependent crashes.
>
> Which basically means we can not do direct writeback at reclaim time?..
>

Well, we _can_, but doing so within the present constraints is delicate.

An implementation which locked all the to-be-written pages up front and
then wrote them out and which was careful not to touch the inode or
address_space after the last page is unlocked could work.

Or perhaps add a new lock to the inode and then in reclaim

a) lock a page on the LRU, thus pinning the address_space and inode.

b) take some new sleeping lock in the inode

c) unlock that page and now proceed to do writeback. But still
honouring !GFP_FS.

and teach the unmount code to take the per-inode locks too, to ensure
that reclaim has got out of there before zapping the inodes. Perhaps a
per-superblock lock rather than per-inode, dunno.

But we won't be able to just dive in there and call the existing
writeback functions from within reclaim. Because

a) callers can hold all sorts of locks, including implicit ones such
as journal_start() and

b) reclaim doesn't have a reference on the page's inode, and the
inode and address_space can vanish if reclaim isn't holding a lock
on one of the address_space's pages.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/