Re: [PATCH v8 5/8] mm: Device exclusive memory access

From: Alistair Popple
Date: Fri May 21 2021 - 02:53:50 EST


On Tuesday, 18 May 2021 11:19:05 PM AEST Alistair Popple wrote:

[...]

> > > +/*
> > > + * Restore a potential device exclusive pte to a working pte entry
> > > + */
> > > +static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
> > > +{
> > > + struct page *page = vmf->page;
> > > + struct vm_area_struct *vma = vmf->vma;
> > > + struct page_vma_mapped_walk pvmw = {
> > > + .page = page,
> > > + .vma = vma,
> > > + .address = vmf->address,
> > > + .flags = PVMW_SYNC,
> > > + };
> > > + vm_fault_t ret = 0;
> > > + struct mmu_notifier_range range;
> > > +
> > > + if (!lock_page_or_retry(page, vma->vm_mm, vmf->flags))
> > > + return VM_FAULT_RETRY;
> > > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma,
> > > vma->vm_mm,
> > > + vmf->address & PAGE_MASK,
> > > + (vmf->address & PAGE_MASK) + PAGE_SIZE);
> > > + mmu_notifier_invalidate_range_start(&range);
> >
> > I looked at MMU_NOTIFIER_CLEAR document and it tells me:
> >
> > * @MMU_NOTIFY_CLEAR: clear page table entry (many reasons for this like
> > * madvise() or replacing a page by another one, ...).
> >
> > Does MMU_NOTIFIER_CLEAR suite for this case? Normally I think for such a
> > case (existing pte is invalid) we don't need to notify at all. However
> > from what I read from the whole series, this seems to be a critical point
> > where we'd like to kick the owner/driver to synchronously stop doing
> > atomic
> > operations from the device. Not sure whether we'd like a new notifier
> > type, or maybe at least comment on why to use CLEAR?
>
> Right, notifying the owner/driver when it no longer has exclusive access to
> the page and allowing it to stop atomic operations is the critical point and
> why it notifies when we ordinarily wouldn't (ie. invalid -> valid).
>
> I did consider adding a new type, but in the driver implementation it ends
> up
> being treated the same as a CLEAR notification anyway so didn't think it was
> necessary. But I suppose adding a different type would allow other listening
> notifiers to filter these which might be worthwhile.
>
> > > +
> > > + while (page_vma_mapped_walk(&pvmw)) {
> >
> > IIUC a while loop of page_vma_mapped_walk() handles thps, however here
> > it's
> > already in do_swap_page() so it's small pte only? Meanwhile...
> >
> > > + if (unlikely(!pte_same(*pvmw.pte, vmf->orig_pte))) {
> > > + page_vma_mapped_walk_done(&pvmw);
> > > + break;
> > > + }
> > > +
> > > + restore_exclusive_pte(vma, page, pvmw.address, pvmw.pte);
> >
> > ... I'm not sure whether passing in page would work for thp after all, as
> > iiuc we may need to pass in the subpage rather than the head page if so.
>
> Hmm, I need to check this and follow up.

Thanks Peter for pointing that out. I needed to follow this up because I had
slightly misunderstood page_vma_mapped_walk(). As you say this is actually a
small pte and restore_exclusive_pte() needs the actual page from the fault. So
I should be able to drop the page_vma_mapped_walk() and use
pte_offset_map_lock() to call restore_exclusive_pte directly.

- Alistair