Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

From: Linus Torvalds
Date: Mon Dec 21 2020 - 14:56:05 EST


On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>
> Nadav Amit found memory corruptions when running userfaultfd test above.
> It seems to me the problem is related to commit 09854ba94c6a ("mm:
> do_wp_page() simplification"). Can you please take a look? Thanks.
>
> TL;DR: it may not safe to make copies of singly mapped (non-COW) pages
> when it's locked or has additional ref count because concurrent
> clear_soft_dirty or change_pte_range may have removed pte_write but yet
> to flush tlb.

Hmm. The TLB flush shouldn't actually matter, because anything that
changes the writable bit had better be serialized by the page table
lock.

Yes, we often load the page table value without holding the page table
lock (in order to know what we are going to do), but then before we
finalize the operation, we then re-check - undet the page table lock -
that the value we loaded still matches.

But I think I see what *MAY* be going on. The userfaultfd
mwriteprotect_range() code takes the mm lock for _reading_. Which
means that you can have

Thread A Thread B

- fault starts. Sees write-protected pte, allocates memory, copies data

- userfaultfd makes the regions writable

- usefaultfd case writes to the region

- userfaultfd makes region non-writable

- fault continues, gets the page table lock, sees that the pte is the
same, uses old copied data

But if this is what's happening, I think it's a userfaultfd bug. I
think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be
a mmap_write_lock().

mprotect() does this right, it looks like userfaultfd does not. You
cannot just change the writability of a page willy-nilly without the
correct locking.

Maybe there are other causes, but this one stands out to me as one
possible cause.

Comments?

Linus