Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device

From: Dan Williams
Date: Fri Apr 05 2019 - 11:43:17 EST


On Fri, Apr 5, 2019 at 4:19 AM Jonathan Cameron
<jonathan.cameron@xxxxxxxxxx> wrote:
>
> On Thu, 4 Apr 2019 12:08:49 -0700
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> > Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> > properties described by the ACPI HMAT is expected to have an application
> > specific consumer.
> >
> > Those consumers may want 100% of the memory capacity to be reserved from
> > any usage by the kernel. By default, with this enabling, a platform
> > device is created to represent this differentiated resource.
> >
> > A follow on change arranges for device-dax to claim these devices by
> > default and provide an mmap interface for the target application.
> > However, if the administrator prefers that some or all of the special
> > purpose memory is made available to the core-mm the device-dax hotplug
> > facility can be used to online the memory with its own numa node.
> >
> > Cc: "Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx>
> > Cc: Len Brown <lenb@xxxxxxxxxx>
> > Cc: Keith Busch <keith.busch@xxxxxxxxx>
> > Cc: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>
> Hi Dan,
>
> Great to see you getting this discussion going so fast and in
> general the approach makes sense to me.
>
> I'm a little confused why HMAT has anything to do with this.
> SPM is defined either via the attribute in SRAT SPA entries,
> EF_MEMORY_SP or via the EFI memory map.
>
> Whether it is in HMAT or not isn't all that relevant.
> Back in the days of the reservation hint (so before yesterday :)
> it was relevant obviously but that's no longer true.
>
> So what am I missing?

It's a good question, and an assumption I should have explicitly
declared in the changelog. The problem with EFI_MEMORY_SP is the same
as the problem with the EfiPersistentMemory type, it isn't precise
enough on its own for the kernel to delineate 'type' or
device/replaceable-unit boundaries. For example, I expect one
EFI_MEMORY_SP range of a specific type may be contiguous with another
range of a different type. Similar to the NFIT there is no requirement
in the specification that platform firmware inject multiple range
entries. Instead that precision is left to the SRAT + HMAT, or the
NFIT in the case of PMEM.

Conversely, and thinking through this a bit more, if a memory range is
"special", but the platform fails to enumerate it in HMAT I think
Linux should scream loudly that the firmware is broken and leave the
range alone. The "scream loudly" piece is missing in the current set,
but the "leave the range alone" functionality is included.