Re: [PATCH v2 6/6] platform/x86: ayaneo-ec: Add suspend hook
From: Guenter Roeck
Date: Wed Oct 29 2025 - 06:22:31 EST
On 10/29/25 01:48, Antheas Kapenekakis wrote:
On Wed, 29 Oct 2025 at 04:36, Mario Limonciello (AMD) (kernel.org)
<superm1@xxxxxxxxxx> wrote:
On 10/28/2025 4:39 PM, Antheas Kapenekakis wrote:
On Tue, 28 Oct 2025 at 22:21, Mario Limonciello <superm1@xxxxxxxxxx> wrote:
On 10/28/25 3:34 PM, Antheas Kapenekakis wrote:
Why are hibernation failures more common in this class of device thanThe fan speed is also lost during hibernation, but since hibernation
failures are common with this class of devices
anything else? The hibernation flow is nearly all done in Linux driver
code (with the exception of ACPI calls that move devices into D3 and out
of D0).
I should correct myself here and say hibernation in general in Linux
leaves something to be desired.
Until secure boot supports hibernation, that will be the case because
not enough people use it.
The upstream kernel has no tie between UEFI secure boot and hibernation.
I think you're talking about some distro kernels that tie UEFI secure
boot to lockdown. Lockdown does currently prohibit hibernation.
I have had it break for multiple reasons, not incl. the ones below and
the ones we discussed last year where games are loaded.
For a few months I fixed some of the bugs but it is not sustainable.
Perhaps you're seeing a manifestation of a general issue that we're
working on a solution for here:
https://lore.kernel.org/linux-pm/20251025050812.421905-1-safinaskar@xxxxxxxxx/
https://lore.kernel.org/linux-pm/20251026033115.436448-1-superm1@xxxxxxxxxx/
https://lore.kernel.org/linux-pm/5935682.DvuYhMxLoT@rafael.j.wysocki/T/#u
Or if you're on an older kernel and using hybrid sleep we had a generic
issue there as well which was fixed in 6.18-rc1.
Nonetheless; don't make policy decisions based upon kernel bugs. Fix
the kernel bugs.
My problem is I cannot in good conscience restore a fan speed before
the program responsible for it is guaranteed to thaw.
The best solution I can come up with would be in freeze save if manual
control is enabled, disable it, and then on resume set a flag that
makes the first write to fan speed also set pwm to manual.
This way suspend->hibernate flows, even if hibernation hangs when
creating the image, at least have proper fan control because they are
unattended, and resume hangs work similarly.
Antheas
This sounds like a workable approach for what I understand to be your
current design; but let me suggest some other ideas.
What happens if you're running something big and the OOM comes and
whacks the process? Now you don't have fan control running anymore.
So I see two options to improve things.
1) You can have userspace send a "heartbeat" to kernel space. This can
be as simple as a timestamp of reading a sysfs file. If userspace
doesn't read the file in X ms then you turn off manual control.
The OOT scenario is something I have not handled yet specifically, or
have had happen.
Systemd will restart the service in the case of OOT after 5 seconds
and in the case of a crash there are multiple fallbacks to ensure the
custom curve turns off.
Most of the hibernation hangs that I have experienced happen before
journalctl turns on, so I assumed that it's before userspace
unfreezes. I am also not sure if restore() gets to run in those cases
or not.
Re: heart beat, read below.
2) You move everything to a kthread. Userspace can read some input
options or maybe pick a few curve settings, but leave all the important
logic in that kthread.
I think this is what Luke tried to do with the Zotac Zone. But in the
end, the kernel is limited to what calculations it can do, esp.
floating point and what it can access, so you end up with a worse
curve with limited extendability, and a driver specific ABI. And we
also risk duplicating all of this code on hwmon drivers and making it
harder to access.
I think part of this reason is why the platform side of the Zotac
stuff has not been upstreamed, even though the driver itself other
than that is pretty straightforward with an established ABI by now.
And it is also the reason we have not been able to add the module to
Bazzite, because 1) we cannot validate the new fan curve calculations
without a device and 2) they are worse that what we provide through
userspace (a polynomial ramp-up which embeds hysteresis to avoid
jittering, plus choice for both Edge and Tctl sensors).
In summary, I think there would great potential for a common set of
"hwmon" helpers that can use a temperature function and a speed set
function to handle a basic multi-point curve for basic, e.g., udev
use-cases. To that end, there could be a helper with a 5 second
timeout that turns off the custom speed. But it would be good for that
to be implemented globally, so it does not block device hw enablement.
Maybe I misunderstand. If so, apologies.
Thermal _control_ is what the thermal subsystem is for. hwmon is for
hardware monitoring, not control. You may do whatever you like
in platform drivers, including the duplication of termal subsystem
functionality, but please do not get hwmon involved. That includes
any kind of helpers to compute any kind of temperature curves.
Thanks,
Guenter