Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending

From: Linus Torvalds
Date: Thu Jan 07 2021 - 17:52:28 EST

On Thu, Jan 7, 2021 at 2:42 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> But I thought we agreed earlier that that is a bug. And I thought the
> softdirty code already got it for writing.

Ho humm. I had obviously not looked very much at that code. I had done
a quick git grep, but now that I look closer, it *does* get the
mmap_sem for writing, but only for that VM_SOFTDIRTY bit clearing, and
then it does a mmap_write_downgrade().

So that's why I had looked more at the UFFD code, because that one was
the one I was aware of doing this all with just the read lock. I
_thought_ the softdirty code already took the write lock and wouldn't
race with page faults.

But I had missed that write_downgrade. So yeah, this code has the same issue.

Anyway, the fix is - I think - the same I outlined earlier when I was
talking about UFFD: take the thing for writing, so that you can't

The alternate fix remains "make sure we always flush the TLB before
releasing the page table lock, and make COW do the copy under the page
table lock". But I really would prefer to just have this code follow
all the usual rules, and if it does a write protect, then it should
take the mmap_sem for writing.

Why is that very simple rule so bad?

(And see my unrelated but incidental note on it being a good idea to
try to minimize latency by making surfe we don't do any IO under the
mmap lock - whether held for reading _or_ writing. Because I do think
we can improve in that area, if you have some good test-case).