Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
From: Andrea Arcangeli
Date: Tue Dec 22 2020 - 16:36:34 EST
On Tue, Dec 22, 2020 at 12:58:18PM -0800, Nadav Amit wrote:
> I had somewhat similar ideas - saving in each page-struct the generation,
> which would allow to: (1) extend pte_same() to detect interim changes
> that were reverted (RO->RW->RO) and (2) per-PTE pending flushes.
What don't you feel safe about, what's the problem with RO->RO->RO, I
don't get it.
The pte_same is perfectly ok without sequence counter in my view, I
never seen anything that would not be ok with pte_same given all the
invariant are respected. It's actually a great optimization compared
to any unscalable sequence counter.
The counter would slowdown everything, having to increase a counter
every time you change a pte, no matter if it's a counter per pgtable
or per-vma or per-mm, sounds very bad.
I'd rather prefer to take mmap_lock_write across the whole userfaultfd
ioctl, than having to deal with a new sequence counter increase for
every pte modification on a heavily contended cacheline.
Also note the counter would have solved nothing for
userfaultfd_writeprotect, it's useless to detect stale TLB entries.
See how !pte_write check happens after the counter was already increased:
CPU0 CPU 1 CPU 2
------ -------- -------
userfaultfd_wrprotect(mode_wp = true)
PT lock
atomic set _PAGE_UFFD_WP and clear _PAGE_WRITE
false_shared_counter_counter++
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PT unlock
do_page_fault FAULT_FLAG_WRITE
userfaultfd_wrprotect(mode_wp = false)
PT lock
ATOMIC clear _PAGE_UFFD_WP <- problem
/* _PAGE_WRITE not set */
false_shared_counter_counter++
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PT unlock
XXXXXXXXXXXXXX BUG RACE window open here
PT lock
counter = false_shared_counter_counter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FAULT_FLAG_WRITE is set by CPU
_PAGE_WRITE is still clear in pte
PT unlock
wp_page_copy
copy_user_page runs with stale TLB
pte_same(counter, orig_pte, pte) -> PASS
^^^^^^^ ^^^^
commit the copy to the pte with the lost writes
deferred tlb flush <- too late
XXXXXXXXXXXXXX BUG RACE window close here
================================================================================