Re: alarm timer/timerfd expiration does not abort suspend operation
From: John Stultz
Date: Fri Feb 10 2017 - 13:49:57 EST
On Fri, Feb 10, 2017 at 1:43 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Fri, 3 Feb 2017, Gabriel Beddingfield wrote:
>> Hi Thomas and John,
>> TL;DR: if an alarmtimer-backed timerfd expires just prior to
>> alarmtimer_suspend() begin called, the system will continue to go into
>> suspend (with possibly no future wakeups scheduled). The expected behavior
>> is that the timer expiration would cause the suspend operation to abort. I
>> see several ways to fix it and want to know your preference.
>> When using an alarmtimer-backed timerfd (i.e. CLOCK_BOOTTIME_ALARM or
>> CLOCK_REALTIME_ALARM) we have observed the following race condition:
>> 1. Userspace commands the system to go into suspend (echo mem >
>> 2. The alarmtimer for a timerfd expires, making the timer inactive until
>> someone reads from the file descriptor.
>> 3. alarmtimer_suspend() does not find any pending timers, and therefore
>> does not schedule a wakeup.
>> 4. device goes into suspend.
>> However, if steps 2 and 3 are swapped, alarmtimer_suspend() would have seen
>> that an expiration was "soon" and cause an abort of the suspend. This can
>> be reproduced on an idle system by having a process aggressively doing
>> `echo mem > /sys/power/state' while another process sets a 4-sec repeating
>> timerfd backed by CLOCK_BOOTTIME_ALARM. Eventually the system will go to
>> sleep and not wake up.
>> I see a few ways to fix it:
>> 1. Create a wakeup_source for each timerfd, and if it's an alarm timer call
>> __pm_stay_awake() in timerfd_triggered() and __pm_relax() in timerfd_read().
>> 2. call pm_system_wakeup() in alarmtimer_fired()
>> 3. call `if (isalarm(ctx)) pm_system_wakeup();' in timerfd_triggered()
>> 4. call __pm_wakeup_event(ws, 2 * MSECS_PER_SEC) in alarmtimer_fired()
>> 5. call `if (isalarm(ctc)) __pm_wakeup_event(ws, 2 * MSECS_PER_SEC);' in
>> timerfd_triggered() (using a static struct wakeup_source).
>> I think #1 is right, followed by #2. They all have pros/cons:
>> * #1 Can eliminate race conditions (rather than an arbitrary 2-sec
>> timeout)... but is effectively holding a hard-to-trace block on all PM
>> operations (e.g. read/write of /sys/power/wakeup_count blocks until someone
>> reads from the file descriptor).
>> * #2 Matches the current behavior of the "happy case"... but bypasses the
>> userspace policy system provided by wakeup.
>> * #3 same pro/con as #2... but solution is specific to timerfd's.
>> * #4 Matches the current behavior of the "happy case" if and only if
>> userspace is using the 'wakeup' system, otherwise doesn't change any
>> behavior. But, I wonder how many people think the current behavior is a bug.
>> * #5 Same pro/con as #4... but solution is specific to timerfd's.
>> I've attached a patch that implements #1.
Sorry for not getting back to you earlier (been traveling this week).
I'm surprised this issue hasn't bitten any of the android folks, as I
believe they have been making use of this for some time now. CC'ing a
few just to make sure we're all on the same page.
The approach you took in your patch looks basically ok to me, though I
think the __pm_wakeup_event() method in #4 sounds safer, just to avoid
the problematic issue if no one is waiting on the fd.
Though I worry I'm not quite understanding the con for that case
properly, so maybe you can clarify what concerns you there?