On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote:
On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote:
On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote:
That would be great, but I don't see that here, do you? All I see is
the shutdown sequence changing because someone wants it to go "faster"
with the threat of hardware breaking if we don't meet that "faster"
number, yet no knowledge or guarantee that this number can ever be known
or happen.
The idea was to have somewhere to send notifications when the hardware
starts reporting things like power supplies starting to fail. We do
have those from hardware, we just don't do anything terribly useful
with them yet.
Ok, but that's not what I recall this patchset doing, or did I missing
something? All I saw was a "reorder the shutdown sequence" set of
changes. Or at least that's all I remember at this point in time,
sorry, it's been a few days, but at least that lines up with what the
Subject line says above :)
That's not in the series, a bunch of it is merged in some form (eg, see
hw_protection_shutdown()) and more of it would need to be built on top
if this were merged.
Agreed, but I don't think this patch is going to actually work properly
over time as there is no time values involved :)
This seems to be more into the area of mitigation than firm solution, I
suspect users will be pleased if they can make a noticable dent in the
number of failures they're seeing.
Mitigation is good, but this patch series is just a hack by doing "throw
this device type at the front of the shutdown list because we have
hardware that crashes a lot" :)
Sounds like a mitigation to me.
It feels like if we're concerned about mitigating physical damage during
the process of power failure that's a very limited set of devices - the
storage case where we're in the middle of writing to flash or whatever
is the most obvious case.
Then why isn't userspace handling this? This is a policy decision that
it needs to take to properly know what hardware needs to be shut down,
and what needs to happen in order to do that (i.e. flush, unmount,
etc.?) And userspace today should be able to say, "power down this
device now!" for any device in the system based on the sysfs device
tree, or at the very least, force it to a specific power state. So why
not handle this policy there?
Given the tight timelines it does seem reasonable to have some of this
in the kernel - the specific decisions about how to handle these events
can always be controlled from userspace (eg, with a sysfs file like we
do for autosuspend delay times which seem to be in a similar ballpark).