Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
From: Kelly Rossmoyer
Date: Sat Jan 29 2022 - 02:53:57 EST
On Tue, Jan 25, 2022 at 9:09 PM Zichar Zhang <zichar.zhang@xxxxxxxxxx> wrote:
>
> hello John, Kelly!
Hello! Happy to see what the future may bring in this space, and looking
forward to what we can all do to move this area forward.
> -- If these IRQs do hanpen, the code in this commit will catch
> them as "unknown wakeup reason" and suggest user to check the
> kernel log for more information.
I would argue that - at least in Android context, as that's what I know -
the kernel log is not a substitute for wakeup reason capture.
1) It is common to find yourself troubleshooting battery life issues for
which kernel log data is not unavailable.
2) Troubleshooting is not the only consideration. Since "why aren't we in
suspend right now?" is such a key question for mobile device battery
life, this is also about power attribution. And I think trying to build
that upon live kernel log parsing would be both inefficient and brittle.
(But my lack of knowledge is vast, so maybe that's a solvable problem?)
> >> * abort reasons, including:
> >> * wakeup_source activity
> >> * failure to freeze userspace
> >> * failure to suspend devices
> >> * failed syscore_suspend callback
>
> -- As mentioned before, if these "abort action" happened, you
> can catch string "unknown wakeup reason", and check the kernel
> log then.
I don't think the kernel log is a solution here. Suspend aborts can be a
significant fraction of how suspend attempts end, making them a key
contributor to battery drain. If the eventual set of patches doesn't solve
for at least the most common kinds of suspend aborts, that leaves a lot of
power observability off the table. And again, at least on Android, kernel
log content is often not available for the interval when a series of
suspend aborts contributed to power drain.
> The patch is not complete, these is the next steps:
> 1. add interface to show time spend in suspend/resume work.
> (after this, I think it could be works and replace the android patch)
I believe Android would experience a significant regression in capability
with only those two patches implemented, so that would require a lot of
careful consideration.
> 2. we need solve the "unmapped HW IRQs", "misconfigured IRQs" and
> "threaded IRQs" problem.
> (this is the hardest one)
FWIW, I'm already expecting unmapped HW IRQs and misconfigured IRQs not to
make the cut. Those have helped solve real and recurring issues, but
they're admittedly niche solutions to infrequent problems at the expense of
messy code coupling, so... maybe not broadly beneficial for Linux.
Threaded wakeup IRQs are a different matter. As one example, there are
architectures in which the RTC wakeup IRQ is threaded and knowing "this
string of wakeups was due to the RTC" is a lot more useful than "this
string of wakeups was due to an irqchip with tens of child IRQs".
> ... So it just report a "wakeup reason" from
> interrupt subsystem. It just a coincidence that most hardware "wakeup
> reason" is also the interupt signal. Even the "interrupt" and "wake up"
> signal are separated from each other in GIC700.
> So give them a chance to report the their "wakeup reason".)
Agreed. But I do hope we can find a way to bring them together so a single
story is presented to userspace. Or, barring that, at least make it easy
for userspace to figure that out after the fact.
Thanks so much for the work you are doing on this and taking the time to
walk me through your thoughts. I'm happy to contribute in whatever manner
adds value.
--
Kelly Rossmoyer | Software Engineer | krossmo@xxxxxxxxxx | 858-239-4111