Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

From: Jonathan Cameron
Date: Tue May 28 2024 - 05:07:01 EST


On Mon, 27 May 2024 11:21:31 +0200
Borislav Petkov <bp@xxxxxxxxx> wrote:

> On Mon, May 20, 2024 at 12:58:57PM +0100, Jonathan Cameron wrote:
> > > Following are some of the use cases of generic scrub control
> > > subsystem as given in the cover letter. Request please add any
> > > other use cases, which I missed.
> > >
> > > 1. There are several types of interfaces to HW memory scrubbers
> > > identified such as ACPI NVDIMM ARS(Address Range Scrub), CXL
> > > memory device patrol scrub, CXL DDR5 ECS, ACPI RAS2 memory
> > > scrubbing features and software based memory scrubber(discussed
> > > in the community Reference [5] in the cover letter). Also some
> > > scrubbers support controlling (background) patrol scrubbing(ACPI
> > > RAS2, CXL) and/or on-demand scrubbing(ACPI RAS2, ACPI ARS).
> > > However the scrub controls varies between memory scrubbers. Thus
> > > there is a need for a standard generic ABI and sysfs scrub
> > > controls for the userspace tools, which control HW and SW
> > > scrubbers in the system, for the easiness of use.
>
> This is all talking about what hw functionality there is. I'm more
> interested in the "there is a need" thing. What need? How?
>
> In order to support something like this upstream, I'd like to know how
> it is going to be used and whether the major use cases are covered. So
> that everyone can benefit from it - not only your employer.

Fair questions.

>
> > > 2. Scrub controls in user space allow the user space tool to disable
> > > and enable the feature in case disabling of the background patrol
> > > scrubbing and changing the scrub rate are needed for other
> > > purposes such as performance-aware operations which requires the
> > > background operations to be turned off or reduced.
>
> Who's going to use those scrub controls? Tools? Admins? Scripts?

If dealing with disabling, I'd be surprised if it was a normal policy but
if it were udev script or boot script. If unusual event (i.e. someone is
trying to reduce jitter in a benchmark targetting something else) then interface
is simple enough that an admin can poke it directly.

>
> > > 3. Allows to perform on-demand scrubbing for specific address range
> > > if supported by the scrubber.
> > > 4. User space tools controls scrub the memory DIMMs regularly at
> > > a configurable scrub rate using the sysfs scrub controls
> > > discussed help, - to detect uncorrectable memory errors early
> > > before user accessing memory, which helps to recover the detected
> > > memory errors. - reduces the chance of a correctable error
> > > becoming uncorrectable.
>
> Yah, that's not my question: my question is, how is this new thing,
> which is exposed to userspace and which then means, this will be
> supported forever, how is this thing going to be used?
>
> And the next question is: is that interface sufficient for those use
> cases?
>
> Are we covering the majority of the usage scenarios?

To a certain extent this is bounded by what the hardware lets us
do but agreed we should make sure it 'works' for the usecases we know
about. Starting point is some more documentation in the patch set
giving common flows (and maybe some example scripts).

>
> > Just to add one more reason a user space interface is needed.
> > 5. Policy control for hotplugged memory. There is not necessarily
> > a system wide bios or similar in the loop to control the scrub
> > settings on a CXL device that wasn't there at boot. What that
> > setting should be is a policy decision as we are trading of
> > reliability vs performance - hence it should be in control of
> > userspace.
> > As such, 'an' interface is needed. Seems more sensible to try and
> > unify it with other similar interfaces than spin yet another one.
>
> Yes, I get that: question is, let's say you have that interface. Now
> what do you do?
>
> Do you go and start a scrub cycle by hand?

Typically no, but the option would be there to support an admin who is
suspicious or who is trying to gather statistics or similar.

>
> Do you have a script which does that based on some system reports?
>

That definitely makes sense for NVDIMM scrub as the model there is
to only ever do it on a demand as a single scrub pass.
For a cyclic scrub we can spin a policy in rasdaemon or similar to
possibly crank up the frequency if we are getting lots of 'non scrub'
faults (i.e. correct error reported on demand accesses).

Shiju is our expert on this sort of userspace stats monitoring and
handling so I'll leave him to come back with a proposal / PoC for doing that.

I can see two motivations though:
a) Gather better stats on suspect device by ensuring more correctable
error detections.
b) Increase scrubbing on a device which is on it's way out but not replacable
yet for some reason.

I would suggest this will be PoC level only for now as it will need
a lot of testing on large fleets to do anything sophisticated.

> Do you automate it? I wanna say yes because that's miles better than
> having to explain yet another set of knobs to users.

First instance, I'd expect an UDEV policy so when a new CXL memory
turns up we set a default value. A cautious admin would have tweaked
that script to set the default to scrub more often, an admin who
knows they don't care might turn it off. We can include an example of that
in next version I think.
>
> And so on and so on...
>
> I'm trying to get you to imagine the *full* solution and then ask
> yourselves whether that new interface is adequate.
>
> Makes more sense?
>

Absolutely. One area that needs to improve (Dan raised it) is
association with HPA ranges so we at can correlate easily error reports
with which scrub engine. That can be done with existing version but
it's fiddlier than it needs to be. This 'might' be a userspace script
example, or maybe making associations tighter in kernel.

Jonathan

> Thx.
>