Re: [PATCH v1 0/3] introduce priority-based shutdown support

From: Greg Kroah-Hartman
Date: Sat Nov 25 2023 - 04:13:14 EST


On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote:
> On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote:
> > On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > > > >
> > > > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > > > failure notifications.
> > > > > > >
> > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the
> > > > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > > > are going to be shut down in X number of seconds when asked? If so, why
> > > > > > > > not do this in userspace?
> > > > > > >
> > > > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > > > notifications from regulators that they're in trouble and we have some
> > > > > > > small amount of time to do anything we might want to do about it before
> > > > > > > we expire.
> > > > > >
> > > > > > So we are going to guarantee a "time" in which we are going to do
> > > > > > something? Again, if that's required, why not do it in userspace using
> > > > > > a RT kernel?
> > > > >
> > > > > For the HW in question I have only 100ms time before power loss. By
> > > > > doing it over use space some we will have even less time to react.
> > > >
> > > > Why can't userspace react that fast? Why will the kernel be somehow
> > > > faster? Speed should be the same, just get the "power is cut" signal
> > > > and have userspace flush and unmount the disk before power is gone. Why
> > > > can the kernel do this any differently?
> > > >
> > > > > In fact, this is not a new requirement. It exist on different flavors of
> > > > > automotive Linux for about 10 years. Linux in cars should be able to
> > > > > handle voltage drops for example on ignition and so on. The only new thing is
> > > > > the attempt to mainline it.
> > > >
> > > > But your patch is not guaranteeing anything, it's just doing a "I want
> > > > this done before the other devices are handled", that's it. There is no
> > > > chance that 100ms is going to be a requirement, or that some other
> > > > device type is not going to come along and demand to be ahead of your
> > > > device in the list.
> > > >
> > > > So you are going to have a constant fight among device types over the
> > > > years, and people complaining that the kernel is now somehow going to
> > > > guarantee that a device is shutdown in a set amount of time, which
> > > > again, the kernel can not guarantee here.
> > > >
> > > > This might work as a one-off for a specific hardware platform, which is
> > > > odd, but not anything you really should be adding for anyone else to use
> > > > here as your reasoning for it does not reflect what the code does.
> > >
> > > I see. Good point.
> > >
> > > In my case umount is not needed, there is not enough time to write down
> > > the data. We should send a shutdown command to the eMMC ASAP.
> >
> > If you don't care about the data, why is a shutdown command to the
> > hardware needed? What does that do that makes anything "safe" if your
> > data is lost.
>
> It prevents HW damage. In a typical automotive under-voltage labor it is
> usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> amount of under-voltage cycles (I do not have exact numbers right now).
> Even if the numbers not so high in the labor tests (sometimes something
> like one bricked device in a month of tests), the field returns are
> significant enough to care about software solution for this problem.

So hardware is attempting to rely on software in order to prevent the
destruction of that same hardware? Surely hardware designers aren't
that crazy, right? (rhetorical question, I know...)

> Same problem was seen not only in automotive devices, but also in
> industrial or agricultural. With other words, it is important enough to bring
> some kind of solution mainline.

But you are not providing a real solution here, only a "I am going to
attempt to shut down a specific type of device before the others, there
are no time or ordering guarantees here, so good luck!" solution.

And again, how are you going to prevent the in-fighting of all device
types to be "first" in the list?

thanks,

greg k-h