Re: [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process

From: Anshuman Khandual
Date: Wed Feb 01 2017 - 01:57:31 EST


On 01/31/2017 11:34 PM, Dave Hansen wrote:
> On 01/30/2017 11:25 PM, John Hubbard wrote:
>> I also don't like having these policies hard-coded, and your 100x
>> example above helps clarify what can go wrong about it. It would be
>> nicer if, instead, we could better express the "distance" between nodes
>> (bandwidth, latency, relative to sysmem, perhaps), and let the NUMA
>> system figure out the Right Thing To Do.
>>
>> I realize that this is not quite possible with NUMA just yet, but I
>> wonder if that's a reasonable direction to go with this?
>
> In the end, I don't think the kernel can make the "right" decision very
> widely here.
>
> Intel's Xeon Phis have some high-bandwidth memory (MCDRAM) that
> evidently has a higher latency than DRAM. Given a plain malloc(), how
> is the kernel to know that the memory will be used for AVX-512
> instructions that need lots of bandwidth vs. some random data structure
> that's latency-sensitive?

CDM has been designed to work with a driver which can take these kind
of appropriate memory placement decisions along the way. But as per
the above example of an generic malloc() allocated buffer.

(1) System RAM gets allocated if there are first CPU faults
(2) CDM memory gets allocated if there are first device access faults
(3) After monitoring the access patterns there after, the driver can
then take required "right" decisions about its eventual placement
and migrates memory as required

>
> In the end, I think all we can do is keep the kernel's existing default
> of "low latency to the CPU that allocated it", and let apps override
> when that policy doesn't fit them.

I think this is almost similar to what we are trying to achieve with
CDM representation and driver based migrations. Dont you agree ?