On Thu, Aug 01, 2024 at 08:36:32AM +0200, David Hildenbrand wrote:
I just added another printf to postcopy_ram_supported_by_host(), where
we temporarily do a UFFDIO_REGISTER on some test area.
Sensing UFFD support # postcopy_ram_supported_by_host()
Sensing UFFD support
Writing received pages during precopy # ram_load_precopy()
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Writing received pages during precopy
Disabling THP: MADV_NOHUGEPAGE # postcopy_ram_prepare_discard()
Discarding pages # loadvm_postcopy_ram_handle_discard()
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Discarding pages
Registering UFFD # postcopy_ram_incoming_setup()
We could think about using this "ever user uffd" to avoid the shared
zeropage in most processes.
Of course, there might be other applications where that wouldn't work,
but I think this behavior (write to area before enabling uffd) might be
fairly QEMU specific already.
It makes me a bit uneasy to hardcode this into the kernel. It's fairly
specific to qemu/criu, and won't protect usecases that behave slightly
differently.
It would also give userfaultfd users that aren't susceptible to this
particular scenario a different code path.
Avoiding the shared zeropage has the benefit that a later write fault
won't have to do a TLB flush and can simply install a fresh anon page.
That's true - although if that happens frequently, it's something we
might want to tune the shrinker for anyway. If subpages do get used
later, we probably shouldn't have split the THP to begin with.
IMO the safest bet would be to use the zero page unconditionally.
return false;
newpte = pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)),
pvmw->vma->vm_page_prot));
set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
We're replacing a present page by another present page without doing a
TLB flush in between. I *think* this should be fine because the new
present page is R/O and cannot possibly be written to.
It's safe because it's replacing a migration entry. The TLB was
flushed when that was installed, and since the migration pte is not
marked present it couldn't have re-established a TLB entry.