Re: [PATCH 0/2] page_count can't be used to decide when wp_page_copy

From: Linus Torvalds
Date: Thu Jan 07 2021 - 16:06:36 EST


On Thu, Jan 7, 2021 at 12:32 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Which is really why I think this needs to be fixed by just fixing UFFD
> to take the write lock.

Side note, and not really related to UFFD, but the mmap_sem in
general: I was at one point actually hoping that we could make the
mmap_sem a spinlock, or at least make the rule be that we never do any
IO under it. At which point a write lock hopefully really shouldn't be
such a huge deal.

The main source of IO under the mmap lock was traditionally the page
faults obviously needing to read the data in, but we now try to handle
that with the whole notion of page fault restart instead.

But I'm 100% sure we don't do as good a job of it as we _could_ do,
and there are probably a lot of other cases where we end up doing IO
under the mmap lock simply because we can and nobody has looked at it
very much.

So if taking the mmap_sem for writing is a huge deal - because it ends
up serializing with IO by people who take it for reading - I think
that is something that might be worth really looking into.

For example, right now I think we (still) only do the page fault retry
once - and the second time if the page still isn't available, we'll
actually wait with the mmap_sem held. That goes back to the very
original page fault retry logic, when I was worried that some infinite
retry would cause busy-waiting because somebody didn't do the proper
"drop mmap_sem, then wait, then return retry".

And if that actually causes problems, maybe we should just make sure
to fix it? remove that FAULT_FLAG_TRIED bit entirely, and make the
rule be that we always drop the mmap_sem and retry?

Similarly, if there are users that don't set FAULT_FLAG_ALLOW_RETRY at
all (because they don't have the logic to check if it's a re-try and
re-do the mmap_sem etc), maybe we can just fix them. I think all the
architectures do it properly in their page fault paths (I think Peter
Xu converted them all - no?), but maybe there are cases of GUP that
don't have it.

Or maybe there is something else that I just didn't notice, where we
end up having bad latencies on the mmap_sem.

I think those would very much be worth fixing, so that if
UFFDIO_WRITEPROTECT taking the mmapo_sem for writing causes problems,
we can _fix_ those problems.

But I think it's entirely wrong to treat UFFDIO_WRITEPROTECT as
specially as Andrea seems to want to treat it. Particularly with
absolutely zero use cases to back it up.

Linus