Re: [RFC/RFT PATCH] PM: sleep: Ignore device driver suspend() callback return values
From: Florian Fainelli
Date: Thu Dec 05 2024 - 12:57:20 EST
On 12/5/24 09:36, Len Brown wrote:
On Thu, Dec 5, 2024 at 10:33 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
...I also think this looks a bit risky as the current behaviour
has really been there for a long time. Who knows what depends on this.
If everything were working 100% of the time, no risk would be justified
because no improvement is possible.
> > But we run over 1,000,000 suspend resume cycles per release in our lab,
and this issue as a category, is the single most common failure.
But you are starting to enter the big number category here, eventually
something is going to fail with that many iterations.
How was this 1 million iterations determined to be a good pass/fail
criteria and just not an arbitrarily high number intended to shake off
issues? Surely with such a big number you start getting an idea of which
specific drivers within your test devices tend to fail to suspend?
FWIW, with the products I work with, which are mainly set-top-box
devices, we just set a pass/fail criteria at 100k which is essentially
assuming there will be 27 suspend/resume cycles per day for the next 10
years, given the lifespan of the products, that seemed way overboard,
realistically there is going to be more like 2-3 suspend/resume cycles
per day.
Worse, there is a huge population of drivers, and we can't possibly test
them all into correctness. Every release this issue crops when another
driver hiccups in response to some device specific transient issue.
The current implementation is not a viable design.
Neither is this approach because it assumes that drivers that need to
abort the system suspend call pm_system_wakeup(), which most do not,
they return -EBUSY or something like that. There is a total of 12 or so
drivers calling pm_system_wakeup(), that's not the majority.
How about you flipped the logic around, introduce an option that lets
you ignore the suspend callback return value gated by a Kconfig option?
--
Florian