Re: [RFC 00/10] Introduce In Field Scan driver

From: Dan Williams
Date: Tue Mar 15 2022 - 12:09:34 EST


On Tue, Mar 15, 2022 at 9:04 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> On Tue, Mar 15, 2022 at 8:27 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Mar 15, 2022 at 02:59:03PM +0000, Luck, Tony wrote:
> > > >> This seems a novel use of uevent ... is it OK, or is is abuse?
> > > >
> > > > Don't create "novel" uses of uevents. They are there to express a
> > > > change in state of a device so that userspace can then go and do
> > > > something with that information. If that pattern fits here, wonderful.
> > >
> > > Maybe Dan will chime in here to better explain his idea. I think for
> > > the case where the core test fails, there is a good match with uevent.
> > > The device (one CPU core) has changed state from "working" to
> > > "untrustworthy". Userspace can do things like: take the logical CPUs
> > > on that core offline, initiate a service call, or in a VMM cluster environment
> > > migrate work to a different node.
> >
> > Again, I have no idea what you are doing at all with this driver, nor
> > what you want to do with it.
> >
> > Start over please.
> >
> > What is the hardware you have to support?
> >
> > What is the expectation from userspace with regards to using the
> > hardware?
>
> Here is what I have learned about this driver since engaging on this
> patch set. Cores go bad at run time. Datacenters can detect them at
> scale.

Tony pointed me to this video if you have not seen it:

https://www.youtube.com/watch?v=QMF3rqhjYuM