Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

From: Linus Torvalds
Date: Fri Dec 29 2006 - 12:26:36 EST

On Fri, 29 Dec 2006, Nick Piggin wrote:
> > It still has a tiny tiny race (see the comment), but I bet nobody can really
> > hit it in real life anyway, and I know several ways to fix it, so I'm not
> > really _that_ worried about it.
> Well the race isn't a data loss one, is it? Just a case where the
> pte may be dirty but the page dirty state not accounted for.

Right. We should be picking it up eventually, since it's still in the page
tables, but if we've lost sight of the page dirtyness we won't react
correctly to msync() and/or fdatasync(). So we don't _lose_ the data, we
just might not write it out in a timely manner if we ever hit the race.

> Can we fix it by just putting the page_mkclean back inside the
> TestClearPageDirty check, and re-clearing PG_dirty after redoing
> the set_page_dirty?

I considered it, but quite frankly, if we did it that way, I'd really like
to just fix the whole insane "set_page_dirty()" instead.

I think set_page_dirty() should be split up. One thing that confused me
mentally was that almost all of the dirty handling was actualyl done only
if PG_dirty wasn't already set, so the _bulk_ of set_page_dirty() really
ends up being

if (!TestSetPageDirty(page)) {
.. we just marked the page dirty, it was clean before,
so we need to add it to the queues etc ..

and that's the part that I (and probably others) always really thought

But then we have the _one_ thing that runs outside of that "do only once
per dirty bit" logic, and it's the buffer dirtying. If we had had two
separate operations for this all: "set_dirty_every_time()" and the regular
"set_dirty()", I don't think this would have been nearly as confusing.

(And then the difference between "__set_page_dirty_nobuffers()" and
"__set_page_dirty_buffers()" really boils down to one doing the
"everytime" _and_ the "once per dirty" checks and the other one doing just
the "once per dirty bit" act - and we could rename the damn things to
something saner too).

If we split it up that way, then the whole clear_page_dirty_for_io() logic
would boil down to

if (TestClearPageDirty(page)) {
if (page_mkclean(page))
return 1;
return 0;

and we wouldn't even need to do any of the "clear dirty again" kind of
idiocy, because the "set_dirty_every_time()" stuff is the one that doesn't
even care about the state of the PG_dirty bit - it's done regardless, and
doesn't really touch it.

That's what I wanted to do, but with the current "set_page_dirty()" setup,
I think my patch makes reasonable sense.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at