Re: Problem when function alarmtimer_suspend returns 0 if time delta is zero

From: John Stultz
Date: Fri Feb 10 2023 - 20:04:44 EST


On Thu, Feb 9, 2023 at 7:40 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Thu, Feb 09 2023 at 12:19, Michael Nazzareno Trimarchi wrote:
> > On Wed, Feb 8, 2023 at 7:06 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> I wrote that patch against the back then mainline code. No idea if it's
> >> still applying, but the underlying issue is still the same AFAICT.
> >>
> >> It needs some polishing and a proper changelog.
> >>
> > Ok, I will try to update it on some mainline kernel in my environment
> > and test it back. I need
> > a little information if it's possible. Consider that I have no
> > experience in this area. I understand how
> > code was designed in general but the part around the freezer and all
> > those code you remove, what was the logic behind in the removed code?
>
> What I can oracle out of that well commented gem is:
>
> A userspace task invokes clock_nanosleep(CLOCK_*_ALARM, ...), which
> arms an alarm timer. The expiry of an alarmtimer causes the system to
> come out of suspend.
>
> As the task invokes schedule() it can also be brought back from
> schedule() via a signal. If that happens then the task cancels the
> alarmtimer and returns to handle the signal. While doing that it can
> be frozen, which means the alarm and therefore the wake from suspend
> is lost.
>
> To prevent that the code tries to save the earliest expiring alarm if
> the task is marked freezing() and the suspend code takes that into
> account.
>
> John, did I summarize that correctly?
>
> The change I made remove that magic and marks the task freezable when it
> goes to schedule, which prevents the signal wakeup. That ensures that
> the alarm stays armed during freeze/suspend.
>
> That obviously needs some testing and scrunity by the folks which use
> this mechanism. IIRC that's used by android, but I might be wrong.

So, thanks for dredging this old thread up, I'm sorry I didn't see it
the first time it came around.

Not having a clear memory of why we do the (min == 0) early return, I
went digging in, and found it was in the original git commit, so I
went looking to the archives.

It's completely not present in the first version of the patch:
https://lore.kernel.org/lkml/1288809079-14663-8-git-send-email-john.stultz@xxxxxxxxxx/

But it did appear in the second version:
https://lore.kernel.org/lkml/1290136329-18291-6-git-send-email-john.stultz@xxxxxxxxxx/

And from there it's a clear case of wanting to avoid setting the RTC
if there were just no timers to expire.

But, indeed this is a bug, as initializing min to ktime_set(0, 0) as
the "invalid" case isn't a good plan, as it might be possible that
suspend is run right as an alarmtimer expires, and you get a real zero
delta value (as has been reported).

Instead it seemed I should have used KTIME_MAX as the "invalid" case
(as Thomas' patch uses).

However, before the patch was merged, it changed further to handle the
freezer waking a current sleeper:
https://lore.kernel.org/lkml/1294280159-2513-13-git-send-email-john.stultz@xxxxxxxxxx/

Which was then used to initialize the min value (still erroneously
using 0 as an "invalid" value) in the case that the freezer woke a
task sleeping which would cause the alarm to be lost (as Thomas
summarized).

Thomas' patch fixes the erronious 0-as-invalid initialization issue
using KTIME_MAX but also simplifies the logic getting rid of the
freezer handling.

I don't have as much familiarity with the freezer handling change, so
while it looks sane, I can't say I would likely catch an issue doing a
visual review.

Michael: If you are still intending to send the patch out, please do,
otherwise I've already forward ported it so I can do some testing with
it. I'm happy to put together a commit message and send it out if
that's easier for you.

And, just for correct Reported-by: tags: Michael Trimarchi, are you
the same Michael (michael@xxxxxxxxx) that originally reported this
issue?

thanks
-john