Re: [PATCH v10 3/6] cxl/memdev: Add trigger_poison_list sysfs attribute

From: Alison Schofield
Date: Fri Mar 31 2023 - 11:45:40 EST


On Thu, Mar 30, 2023 at 03:55:46PM -0700, Dave Jiang wrote:
>
>
> On 3/21/23 7:12 PM, alison.schofield@xxxxxxxxx wrote:
> > From: Alison Schofield <alison.schofield@xxxxxxxxx>
> >
> > When a boolean 'true' is written to this attribute the memdev driver
> > retrieves the poison list from the device. The list consists of
> > addresses that are poisoned, or would result in poison if accessed,
> > and the source of the poison. This attribute is only visible for
> > devices supporting the capability. The retrieved errors are logged
> > as kernel trace events with the label 'cxl_poison'.
> >
> > Signed-off-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > ---
> > Documentation/ABI/testing/sysfs-bus-cxl | 14 ++++++++
> > drivers/cxl/core/memdev.c | 48 +++++++++++++++++++++++++
> > drivers/cxl/cxlmem.h | 5 ++-
> > drivers/cxl/mem.c | 36 +++++++++++++++++++
> > 4 files changed, 102 insertions(+), 1 deletion(-)
> >

snip

> > +static int cxl_get_poison_by_memdev(struct cxl_memdev *cxlmd)
> > +{
> > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > + u64 offset, length;
> > + int rc = 0;
> > +
> > + /* CXL 3.0 Spec 8.2.9.8.4.1 Separate pmem and ram poison requests */
> > + if (resource_size(&cxlds->pmem_res)) {
> > + offset = cxlds->pmem_res.start;
> > + length = resource_size(&cxlds->pmem_res);
> > + rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> > + if (rc)
> > + return rc;
> > + }
> > + if (resource_size(&cxlds->ram_res)) {
> > + offset = cxlds->ram_res.start;
> > + length = resource_size(&cxlds->ram_res);
> > + rc = cxl_mem_get_poison(cxlmd, offset, length, NULL);
> > + /*
> > + * Invalid Physical Address is not an error for
> > + * volatile addresses. Device support is optional.
> > + */
> > + if (rc == -EFAULT)

See this EFAULT. That is why I changed table further down to
allow EFAULT to get through explicitly.

snip

> > @@ -130,6 +177,7 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > {
> > if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> > return 0;
> > +
>
> Stray blank line?

Yes.

>
> > return a->mode;
> > }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 57a5999ddb35..5febaa3f9b04 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -145,7 +145,7 @@ struct cxl_mbox_cmd {
> > C(FWROLLBACK, -ENXIO, "rolled back to the previous active FW"), \
> > C(FWRESET, -ENXIO, "FW failed to activate, needs cold reset"), \
> > C(HANDLE, -ENXIO, "one or more Event Record Handles were invalid"), \
> > - C(PADDR, -ENXIO, "physical address specified is invalid"), \
> > + C(PADDR, -EFAULT, "physical address specified is invalid"), \
>
> Seems unrelated change? Does it go with previous patch?

See prior note. Need this one explicitly.

>
snip to end.