Re: [PATCH V3 0/4] Define coherent device memory node

From: Balbir Singh
Date: Thu Feb 16 2017 - 17:15:36 EST

On 16/02/17 05:20, Mel Gorman wrote:
> On Wed, Feb 15, 2017 at 05:37:22PM +0530, Anshuman Khandual wrote:
>> This four patches define CDM node with HugeTLB & Buddy allocation
>> isolation. Please refer to the last RFC posting mentioned here for more
> Always include the background with the changelog itself. Do not assume that
> people are willing to trawl through a load of past postings to assemble
> the picture. I'm only taking a brief look because of the page allocator
> impact but it does not appear that previous feedback was addressed.
> In itself, the series does very little and as Vlastimil already pointed
> out, it's not a good idea to try merge piecemeal when people could not
> agree on the big picture (I didn't dig into it).

The idea of CDM is independent of how some of the other problems related
to AutoNUMA balancing is handled. The idea of this patchset was to introduce
the concept of memory that is not necessarily system memory, but is coherent
in terms of visibility/access with some restrictions

> The only reason I'm commenting at all is to say that I am extremely opposed
> to the changes made to the page allocator paths that are specific to
> CDM. It's been continual significant effort to keep the cost there down
> and this is a mess of special cases for CDM. The changes to hugetlb to
> identify "memory that is not really memory" with special casing is also
> quite horrible.
> It's completely unclear that even if one was to assume that CDM memory
> should be expressed as nodes why such systems do not isolate all processes
> from CDM nodes by default and then allow access via memory policies or
> cpusets instead of special casing the page allocator fast path. It's also
> completely unclear what happens if a device should then access the CDM
> and how that should be synchronised with the core, if that is even possible.

A big part of this is driven by the need to special case what allocations
go there. The idea being that an allocation should get there only when
explicitly requested. Unfortunately, IIUC node distance is not a good
isolation metric. CPUsets are heavily driven by user space and we
believe that setting up CDM is not an administrative operation, its
going to be hard for an administrator or user space application to set
up the right policy or an installer to figure it out. It does not help
that CPUSets assume inheritance from the root hierarchy. As far as the
overheads go, one could consider using STATIC_KEYS if that is worthwhile.

> It's also unclear if this is even usable by an application in userspace
> at this point in time. If it is and the special casing is needed then the
> regions should be isolated from early mem allocations in the arch layer
> that is CDM aware, initialised late, and then setup userspace to isolate
> all but privileged applications from the CDM nodes. Do not litter the core
> with is_cdm_whatever checks.

The idea is to have these nodes as ZONE_MOVABLE and those are isolated from
early mem allocations. Any new feature requires checks, but one could consider
consolidating those checks

Balbir Singh.