Re: [PATCH v3 5/8] mm: Device exclusive memory access

From: Alistair Popple
Date: Tue Mar 02 2021 - 04:24:01 EST


On Tuesday, 2 March 2021 11:05:59 AM AEDT Jason Gunthorpe wrote:
> On Fri, Feb 26, 2021 at 06:18:29PM +1100, Alistair Popple wrote:
>
> > +/**
> > + * make_device_exclusive_range() - Mark a range for exclusive use by a
device
> > + * @mm: mm_struct of assoicated target process
> > + * @start: start of the region to mark for exclusive device access
> > + * @end: end address of region
> > + * @pages: returns the pages which were successfully mark for exclusive
acces
> > + *
> > + * Returns: number of pages successfully marked for exclusive access
> > + *
> > + * This function finds the ptes mapping page(s) to the given address
range and
> > + * replaces them with special swap entries preventing userspace CPU
access. On
> > + * fault these entries are replaced with the original mapping after
calling MMU
> > + * notifiers.
> > + */
> > +int make_device_exclusive_range(struct mm_struct *mm, unsigned long
start,
> > + unsigned long end, struct page **pages)
> > +{
> > + long npages = (end - start) >> PAGE_SHIFT;
> > + long i;
> > +
> > + npages = get_user_pages_remote(mm, start, npages,
> > + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
> > + pages, NULL, NULL);
> > + for (i = 0; i < npages; i++) {
> > + if (!trylock_page(pages[i])) {
> > + put_page(pages[i]);
> > + pages[i] = NULL;
> > + continue;
> > + }
> > +
> > + if (!try_to_protect(pages[i])) {
>
> Isn't this racy? get_user_pages returns the ptes at an instant in
> time, they could have already been changed to something else?

Right. On it's own this does not guarantee that the page is mapped at the
given location, only that a mapping won't get established without an mmu
notifier callback to clear the swap entry.

The intent was a driver could use HMM or some other mechanism to keep PTEs
synchronised if required. However I just looked at patch 8 in the series again
and it appears I got this wrong when converting from the old migration
approach:

+ mutex_unlock(&svmm->mutex);
+ ret = nouveau_atomic_range_fault(svmm, drm, args,
+ size, hmm_flags, mm);

The mutex needs to be unlocked after the range fault to ensure the PTE hasn't
changed. But this ends up being a problem because try_to_protect() calls
notifiers which need to take that mutex and hence deadlocks.

> I would think you'd want to switch to the swap entry atomically under
> th PTLs?

That is one approach, but the reuse of get_user_pages() to walk the page
tables and fault/gather the pages is a nice simplification and adding a new
FOLL flag/mode to atomically swap entries doesn't seem right.

However try_to_protect() scans the PTEs again under the PTL so checking the
mapping of interest actually gets replaced during the rmap walk seems like a
reasonable solution. Thanks for the comments.

- Alistair

> Jason
>