Re: [patch 4/8] mm: allow not updating BDI stats inend_page_writeback()

From: Peter Zijlstra
Date: Tue Mar 18 2008 - 09:08:43 EST


On Tue, 2008-03-18 at 13:51 +0100, Miklos Szeredi wrote:
> > Yes, it does two things, _however_ those two things are very much
> > related. Your use-case that breaks this relation is an execption - and I
> > haven't really grasped it yet..
> >
> > I'm in general not too keen about you having to export the BDI
> > accounting stuff and using it explicitly like this, but I'm afraid I
> > don't see a way around it - the danger is that other filesystems will
> > get creative (hence the req for GPL - that excludes the most creative
> > ones).
> >
> > Yes, it makes sense to delay the write completion accounting until its
> > actually completed.. but I would suggest all writeback accounting.
>
> Doesn't work, as long as we have throttle_vm_writeout() waiting for
> NR_WRITEBACK to go below a threshold, delaying the NR_WRITEBACK
> accounting could lead to a deadlock.
>
> So at least until that's resolved NR_WRITEBACK_TEMP needs to be
> separate from NR_WRITEBACK_TEMP. And it makes sense possibly even
> after that, as they are fundamentally different things. The first one
> is page cache pages being under writeout, the second is just kernel
> buffers (mostly) unrelated to the page cache.

Urgh, throttle_vm_writeout() again.. Agreed, that'll deadlock.

> > So the thing that's in your way is that removing a page from the radix
> > tree doesn't imply its done writing. So perhaps we should make that
> > distinction instead?
> >
> > So instead of conditionally do part of the accounting, never do it and
> > require something like: page_writeback_complete() to be called after a
> > successfull test_clear_page_writeback().
>
> Yes, that's a possibility, but then normal filesystems miss out on the
> small optimization provided by doing the BDI accounting functions
> inside the same IRQ disabled region as the radix tree operation.
> Would that have any significant performance impact?

Yeah, realized that. Don't know, would have to measure it somehow...
some archs are rather slow with disabling IRQs, but we're talking about
writeout which should be dominated by the IO times.

Its just that your proposal exposes too much guts, I'd like the
interface to be a little higher level.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/