Re: s2idle breaks on machines without cpuidle support
From: Kazuki
Date: Wed Feb 08 2023 - 06:21:57 EST
On Wed, Feb 08, 2023 at 10:35:11AM +0000, Sudeep Holla wrote:
> On Wed, Feb 08, 2023 at 04:48:18AM +0900, Kazuki wrote:
> > On Mon, Feb 06, 2023 at 10:12:39AM +0000, Sudeep Holla wrote:
> > >
> > > What do you mean by break ? More details on the observation would be helpful.
> > For example, CLOCK_MONOTONIC doesn't stop even after suspend since
> > these chain of commands don't get called.
> >
> > call_cpuidle_s2idle->cpuidle_enter_s2idle->enter_s2idle_proper->tick_freeze->sched_clock_suspend (Function that pauses CLOCK_MONOTONIC)
> >
> > Which in turn causes programs like systemd to crash since it doesn't
> > expect this.
>
> Yes expected IIUC. The per-cpu timers and counters continue to tick in
> WFI and hence CLOCK_MONOTONIC can't stop.
Yes, but it shouldn't be the case when suspending[1]. Currently that's what
happens when we enter s2idle without a cpuidle driver. This doesn't seem
to happen with S3 sleep [2].
[1]
Documentation/core-api/timekeeping.rst:
.. c:function:: ktime_t ktime_get( void )
CLOCK_MONOTONIC
Useful for reliable timestamps and measuring short time intervals
accurately. Starts at system boot time but stops during suspend.
[2]
kernel/time/sched_clock.c:
int sched_clock_suspend(void)
{
struct clock_read_data *rd = &cd.read_data[0];
update_sched_clock();
hrtimer_cancel(&sched_clock_timer);
rd->read_sched_clock = suspended_sched_clock_read;
return 0;
}
void sched_clock_resume(void)
{
struct clock_read_data *rd = &cd.read_data[0];
rd->epoch_cyc = cd.actual_read_sched_clock();
hrtimer_start(&sched_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL_HARD);
rd->read_sched_clock = cd.actual_read_sched_clock;
}
static struct syscore_ops sched_clock_ops = {
.suspend = sched_clock_suspend,
.resume = sched_clock_resume,
};
static int __init sched_clock_syscore_init(void)
{
register_syscore_ops(&sched_clock_ops);
return 0;
}
device_initcall(sched_clock_syscore_init);
>
> > >
> > > > 2. Suspend actually works on ARM64 machines even without proper
> > > > cpuidle (PSCI cpuidle) since they support wfi, so the assumption here is wrong
> > > > on such machines
> > > >
> > >
> > > Sorry I am bit confused here. Your point (2) contradicts the $subject.
> > drivers/cpuidle/cpuidle.c:
> >
> > bool cpuidle_not_available(struct cpuidle_driver *drv,
> > struct cpuidle_device *dev)
> > {
> > return off || !initialized || !drv || !dev || !dev->enabled;
> > }
> >
> > The cpuidle framework reports ARM64 devices without PSCI cpuidle as
> > "cpuidle not available" even when they support wfi, which causes suspend
> > to fail, which shouldn't be happening since they do support idling.
>
> Yes with just WFI, there will be no active cpuidle driver.
>
> [...]
>
> > > Again, since s2idle is userspace driven, I don't understand what do you
> > > mean by unbootable kernel in the context of s2idle.
> >
> > Sorry, I meant "attempts to fix this bug have all led to an unbootable
> > kernel."
>
> Again I assume you mean kernel hang or crash and nothing to do with boot.
> Once you enter s2i state with your changes/fix, it hangs or is unresponsive
> as it might have either failed to enter or resume from the state.
>
> --
> Regards,
> Sudeep
Thanks,
Kazuki