Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

From: Linus Torvalds
Date: Fri Jan 05 2007 - 16:35:41 EST




On Fri, 5 Jan 2007, Christoph Lameter wrote:
>
> It looks as if most code handling the dirty bits already uses the page
> lock?

Much does. But I did some debugging (when trying to figure out the VM
corruption), and certainly not all of it does. And when I looked at some
of the code-paths, I was not at all sure that we could always get the
lock easily.

> According to the comments: Most callers of __set_page_dirty_nobuffers hold
> the page lock. The only exception seems to be zap_pte_range where we
> transfer the dirty information from the pte to the page. This is now much
> rarer since the dirty mmap tracking patches make the fault handler deal
> with it.

No, it still happens as frequently as before - because the page must STAY
dirty while it's writable. So it's still dirty when unmapped.

What the dirty mmap tracking patches do is that if a _lot_ of pages are
dirty, we'll do the balancing, and at that point, if we end up doing IO,
we mark them clean in the page tables again.

But don't think that they aren't dirty in the page tables, and that
zap-page range doesn't do stuff. I made that thinko myself at some point,
but it's wrong.

> Still, the page dirtying in zap_pte_range remains a potential trouble spot
> for the remaining cases in which we need to dirty pages there since it is
> not rate limited. There is a potential cause for creating deadlocks
> because too many pages were dirtied and the file system cannot allocate
> enough memory for writeout.

Well, this is what the dirty mmap tracking should finally have fixed. We
know we have only a _limited_ number of dirty pages, in a way that we
never knew before.

But afaik, it wasn't just zap_pte_range(), it was something else too that
didn't have the page lock. Maybe I remember wrongly, though.

Anyway, the fundamental issue doesn't go away: the page lock isn't
actually the worst of the dirty bit handling.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/