Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

From: Peter Xu
Date: Mon Dec 21 2020 - 12:28:44 EST


Hi, Nadav,

On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:

[...]

> So to correct myself, I think that what I really encountered was actually
> during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
> problem was that in this case the “write”-bit was removed during unprotect.
> Sorry for the strange formatting to fit within 80 columns:

I assume I can ignore the race mentioned in the commit message but only refer
to this one below. However I'm still confused. Please see below.

>
>
> [ Start: PTE is writable ]
>
> cpu0 cpu1 cpu2
> ---- ---- ----
> [ Writable PTE
> cached in TLB ]

Here cpu2 got writable pte in tlb. But why?

If below is an unprotect, it means it must have been protected once by
userfaultfd, right? If so, the previous change_protection_range() which did
the wr-protect should have done a tlb flush already before it returns (since
pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has
stall data.

If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
happen either, because cpu1/cpu2 will all go through the cow path and pgtable
lock should serialize them.

> userfaultfd_writeprotect()
> [ write-*unprotect* ]
> mwriteprotect_range()
> mmap_read_lock()
> change_protection()
>
> change_protection_range()
> ...
> change_pte_range()
> [ *clear* “write”-bit ]
> [ defer TLB flushes]
> [ page-fault ]
>
> wp_page_copy()
> cow_user_page()
> [ copy page ]
> [ write to old
> page ]
>
> set_pte_at_notify()
>
> [ End: cpu2 write not copied form old to new page. ]

Could you share how to reproduce the problem? I would be glad to give it a
shot as well.

> [1] https://lore.kernel.org/patchwork/patch/1346386

PS: Sorry to not have read the other series of yours. It seems to need some
chunk of time so I postponed it a bit due to other things; but I'll read at
least the fixes very soon.

Thanks,

--
Peter Xu