Re: [PATCH v18 04/19] EDAC: Add memory repair control feature

From: Jonathan Cameron
Date: Thu Jan 09 2025 - 09:25:00 EST


On Thu, 9 Jan 2025 13:32:22 +0100
Borislav Petkov <bp@xxxxxxxxx> wrote:

Hi Boris,

> On Thu, Jan 09, 2025 at 11:00:43AM +0000, Shiju Jose wrote:
> > The min_ and max_ attributes of the control attributes are added for your
> > feedback on V15 to expose supported ranges of these control attributes to the user,
> > in the following links.
>
> Sure, but you can make that differently:
>
> cat /sys/bus/edac/devices/<dev-name>/mem_repairX/bank
> [x:y]
>
> which is the allowed range.

To my thinking that would fail the test of being an intuitive interface.
To issue a repair command requires that multiple attributes be configured
before triggering the actual repair.

Think of it as setting the coordinates of the repair in a high dimensional
space.

In the extreme case of fine grained repair (Cacheline), to identify the
relevant subunit of memory (obtained from the error record that we are
basing the decision to repair on) we need to specify all of:

Channel, sub-channel, rank, bank group, row, column and nibble mask.
For coarser granularity repair only specify a subset of these applies and
only the relevant controls are exposed to userspace.

They are broken out as specific attributes to enable each to be set before
triggering the action with a write to the repair attribute.

There are several possible alternatives:

Option 1

"A:B:C:D:E:F:G:H:I:J" opaque single write to trigger the repair where
each number is providing one of those coordinates and where a readback
let's us known what each number is.

That single attribute interface is very hard to extend in an intuitive way.

History tell us more levels will be introduced in the middle, not just
at the finest granularity, making such an interface hard to extend in
a backwards compatible way.

Another alternative of a key value list would make for a nasty sysfs
interface.

Option 2
There are sysfs interfaces that use a selection type presentation.

Write: "C", Read: "A, B, [C], D" but that only works well for discrete sets
of options and is a pain to parse if read back is necessary.

So in conclusion, I think the proposed multiple sysfs attribute style
with them reading back the most recent value written is the least bad
solution to a complex control interface.

>
> echo ...
>
> then writes in the bank.
>
> > ... so we would propose we do not add max_ and min_ for now and see how the
> > use cases evolve.
>
> Yes, you should apply that same methodology to the rest of the new features
> you're adding: only add functionality for the stuff that is actually being
> used now. You can always extend it later.
>
> Changing an already user-visible API is a whole different story and a lot lot
> harder, even impossible.
>
> So I'd suggest you prune the EDAC patches from all the hypothetical usage and
> then send only what remains so that I can try to queue them.

Sure. In this case the addition of min/max was perhaps a wrong response to
your request for a way to those ranges rather than just rejecting a write
of something out of range as earlier version did.

We can revisit in future if range discovery becomes necessary. Personally
I don't think it is given we are only taking these actions in response error
records that give us precisely what to write and hence are always in range.

Jonathan

>
> Thx.
>