Re: [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault

From: Dave Hansen
Date: Mon Jan 30 2017 - 12:52:13 EST


Here's the flag definition:

> +#ifdef CONFIG_COHERENT_DEVICE
> +#define VM_CDM 0x00800000 /* Contains coherent device memory */
> +#endif

But it doesn't match the implementation:

> +#ifdef CONFIG_COHERENT_DEVICE
> +static void mark_vma_cdm(nodemask_t *nmask,
> + struct page *page, struct vm_area_struct *vma)
> +{
> + if (!page)
> + return;
> +
> + if (vma->vm_flags & VM_CDM)
> + return;
> +
> + if (nmask && !nodemask_has_cdm(*nmask))
> + return;
> +
> + if (is_cdm_node(page_to_nid(page)))
> + vma->vm_flags |= VM_CDM;
> +}

That flag is a one-way trip. Any VMA with that flag set on it will keep
it for the life of the VMA, despite whether it has CDM pages in it now
or not. Even if you changed the policy back to one that doesn't allow
CDM and forced all the pages to be migrated out.

This also assumes that the only way to get a page mapped into a VMA is
via alloc_pages_vma(). Do the NUMA migration APIs use this path?

When you *set* this flag, you don't go and turn off KSM merging, for
instance. You keep it from being turned on from this point forward, but
you don't turn it off.

This is happening with mmap_sem held for read. Correct? Is it OK that
you're modifying the VMA? That vm_flags manipulation is non-atomic, so
how can that even be safe?

If you're going to go down this route, I think you need to be very
careful. We need to ensure that when this flag gets set, it's never set
on VMAs that are "normal" and will only be set on VMAs that were
*explicitly* set up for accessing CDM. That means that you'll need to
make sure that there's no possible way to get a CDM page faulted into a
VMA unless it's via an explicitly assigned policy that would have cause
the VMA to be split from any "normal" one in the system.

This all makes me really nervous.