Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

From: Davidlohr Bueso
Date: Thu Jun 16 2022 - 18:02:17 EST


On Thu, 16 Jun 2022, Alison Schofield wrote:
I'm headed in this direction -

I like these interfaces, btw.

cxl list --media-errors -m mem1
lists media errors for requested memdev

But in this patchset you're only listing for persistent configurations.
So if there is a volatile partion, or the whole device is volatile,
this would not consider that.

So unless I'm missing something, we need to consider ram_range as well.

cxl list --media-errors -r region#
lists region errors with HPA addresses
(So here cxl tool will collect the poison for all the regions
memdevs and do the DPA to HPA translation)

I was indeed thinking along these lines. But similar to the above,
the region driver also has plans to enumarate volatile regions
configured by BIOS.


To answer your question, I wasn't thinking of limiting
the range within the memdev, but certainly could. And if we were
taking in ranges, those ranges would need to be checked.

My question was originally considering poisoning only within pmem DPA
ranges, but now I'm wondering if all this also applies equally to volatile
parts as well... Reading the spec I interpret both, but reading the
T3 Memory Device Software Guide '2.13.19' it only mentions persistent
capacity.


$cxl list --media-errors -m mem1 --range-start= --range-end|len=

I figure this kind of like the above with regions being very arbitrary
and dynamic.

Now, if I left the sysfs interface as is, the driver will read the
entire poison list for the memdev and then cxl tool will filter it
for the range requested.

Or, maybe we should implement in libcxl (not sysfs), with memdev and
range options and only collect from the device the range requested.

I wonder if the latter may be the better option considering that always
scanning the entire memdev would cause unnecessary media scan wait times,
specially for large capacities.

Thanks,
Davidlohr