Re: [RFC 5/8] mm: Add new flag VM_CDM for coherent device memory

From: Dave Hansen
Date: Tue Oct 25 2016 - 16:01:39 EST


On 10/25/2016 12:20 PM, Aneesh Kumar K.V wrote:
> Dave Hansen <dave.hansen@xxxxxxxxx> writes:
>> On 10/23/2016 09:31 PM, Anshuman Khandual wrote:
>>> VMAs containing coherent device memory should be marked with VM_CDM. These
>>> VMAs need to be identified in various core kernel paths and this new flag
>>> will help in this regard.
>>
>> ... and it's sticky? So if a VMA *ever* has one of these funky pages in
>> it, it's stuck being VM_CDM forever? Never to be merged with other
>> VMAs? Never to see the light of autonuma ever again?
>>
>> What if a 100TB VMA has one page of fancy pants device memory, and the
>> rest normal vanilla memory? Do we really want to consider the whole
>> thing fancy?
>
> This definitely needs fine tuning. I guess we should look at this as
> possibly stating that, coherent device would like to not participate in
> auto numa balancing
...

Right, in this one, particular case you don't want NUMA balancing. But,
if you have to take an _explicit_ action to even get access to this
coherent memory (setting a NUMA policy), why keeps that explicit action
from also explicitly disabling NUMA migration?

I really don't think we should tie together the isolation aspect with
anything else, including NUMA balancing.

For instance, on x86, we have the ability for devices to grok the CPU's
page tables, including doing faults. There's very little to stop us
from doing things like autonuma.

> One possible option is to use a software pte bit (may be steal
> _PAGE_DEVMAP) and prevent a numa pte setup from change_prot_numa().
> ie, if the pfn backing the pte is from coherent device we don't allow
> that to be converted to a prot none pte for numa faults ?

Why would you need to tag individual pages, especially if the VMA has a
policy set on it that disallows migration?

But, even if you did need to identify individual pages from the PTE, you
can easily do:

page_to_nid(pfn_to_page(pte_pfn(pte)))

and then tell if the node is a fancy-pants device node.