Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

From: Dan Williams
Date: Fri Jun 17 2022 - 15:02:56 EST


Alison Schofield wrote:
> On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote:
> > On Tue, 14 Jun 2022, alison.schofield@xxxxxxxxx wrote:
> >
> > >From: Alison Schofield <alison.schofield@xxxxxxxxx>
> > >
> > >CXL devices that support persistent memory maintain a list of locations
> > >that are poisoned or result in poison if the addresses are accessed by
> > >the host.
> > >
> > >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > >list as a set of Media Error Records that include the source of the
> > >error, the starting device physical address and length. The length is
> > >the number of adjacent DPAs in the record and is in units of 64 bytes.
> > >
> > >Retrieve the list and log each Media Error Record as a trace event of
> > >type cxl_poison_list.
> > >
> > >Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> > >---
> > > drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> > > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 118 insertions(+)
> > >
> snip
>
> > >+int cxl_mem_get_poison_list(struct device *dev)
> > >+{
> > >+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > >+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > >+ struct cxl_mbox_poison_payload_out *po;
> > >+ struct cxl_mbox_poison_payload_in pi;
> > >+ int nr_records = 0;
> > >+ int rc, i;
> > >+
> > >+ if (range_len(&cxlds->pmem_range)) {
> > >+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > >+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
>
> First off - you stopped at a bug here - that pi.length needs to be
> in units of 64 bytes.
> >
> > Do you ever see this changing to not always use the full pmem DPA range
> > but allow arbitrary ones? I also assume this is the reason why you don't
> > check the range vs cxlds->ram_range to prevent any overlaps, no?
> >
> > Thanks,
> > Davidlohr
>
> David - Great question!
>
> I'm headed in this direction -
>
> cxl list --media-errors -m mem1
> lists media errors for requested memdev
>
> cxl list --media-errors -r region#
> lists region errors with HPA addresses
> (So here cxl tool will collect the poison for all the regions
> memdevs and do the DPA to HPA translation)
>
> To answer your question, I wasn't thinking of limiting
> the range within the memdev, but certainly could. And if we were
> taking in ranges, those ranges would need to be checked.
>
> $cxl list --media-errors -m mem1 --range-start= --range-end|len=
>
> Now, if I left the sysfs inteface as is, the driver will read the
> entire poison list for the memdev and then cxl tool will filter it
> for the range requested.
>
> Or, maybe we should implement in libcxl (not sysfs), with memdev and
> range options and only collect from the device the range requested.
>
> Either one looks the same to the cxl tool user, but limiting the
> range we send to the device would certainly cut down on unwanted
> records being logged, retrieved, and examined.
>
> I'd like to hear more from you and other community members.

There is some history here that is relevant to this design. CXL Get
Poison List builds on lessons learned from the ACPI "Address Range
Scrub" mechanism that was deployed for ACPI described persistent memory
platform (See ACPI 6.4 9.20.7.2 "Address Range Scrubbing (ARS)
Overview"). In that case there was no expectation that the device
maintained a cached and coherent (with incoming poison writes) copy of
the media error list. CXL Get Poison List in comparison is meant to
obviate the need to perform Scan Media in most scenarios, and it is
lightweight enough that userspace need not have a mechanism to request
errors by range, in my opinion.

One of the design warts of drivers/acpi/nfit/ that I want to get away
from in the case of drivers/cxl/ is snooping the equivalent of ARS
command results to populate a kernel list of poison addresses and
instead put the onus on userspace to collect DPA events and optionally
inform the kernel of the HPA impacts. For example, DAX filesystems will
soon have the ability to do something useful with poison notifications
[1], but that support is limited to synchronously consumed poison
flagged by memory_failure(). When the cxl tool translates the poison
list to HPA it needs an ABI to turn around and notify the filesystem
about which blocks got clobbered.

[1]: https://lore.kernel.org/all/20220616193157.2c2e963f3e7e38dfac554a28@xxxxxxxxxxxxxxxxxxxx/