Re: [PATCH v2] cpuidle: poll_state: Add time limit to poll_idle()

From: Rafael J. Wysocki
Date: Wed Mar 14 2018 - 08:08:18 EST


On Wednesday, March 14, 2018 1:04:50 PM CET Peter Zijlstra wrote:
> On Mon, Mar 12, 2018 at 10:36:27AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > If poll_idle() is allowed to spin until need_resched() returns 'true',
> > it may actually spin for a much longer time than expected by the idle
> > governor, since set_tsk_need_resched() is not always called by the
> > timer interrupt handler. If that happens, the CPU may spend much
> > more time than anticipated in the "polling" state.
> >
> > To prevent that from happening, limit the time of the spinning loop
> > in poll_idle().
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> >
> > -> v2: After additional testing reduce POLL_IDLE_TIME_CHECK_COUNT to 1000.
> >
> > ---
> > drivers/cpuidle/poll_state.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/cpuidle/poll_state.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/poll_state.c
> > +++ linux-pm/drivers/cpuidle/poll_state.c
> > @@ -5,16 +5,31 @@
> > */
> >
> > #include <linux/cpuidle.h>
> > +#include <linux/ktime.h>
> > #include <linux/sched.h>
> > #include <linux/sched/idle.h>
> >
> > +#define POLL_IDLE_TIME_CHECK_COUNT 1000
> > +#define POLL_IDLE_TIME_LIMIT (TICK_NSEC / 16)
> > +
> > static int __cpuidle poll_idle(struct cpuidle_device *dev,
> > struct cpuidle_driver *drv, int index)
> > {
> > + ktime_t start = ktime_get();
>
> I would recoomend not using ktime_get(), imagine the 'joy' if that
> happens to be the HPET.
>
> > local_irq_enable();
> > if (!current_set_polling_and_test()) {
> > + unsigned int time_check_counter = 0;
> > +
> > + while (!need_resched()) {
> > cpu_relax();
> > + if (time_check_counter++ < POLL_IDLE_TIME_CHECK_COUNT)
> > + continue;
> > +
> > + time_check_counter = 0;
> > + if (ktime_sub(ktime_get(), start) > POLL_IDLE_TIME_LIMIT)
> > + break;
> > + }
> > }
> > current_clr_polling();
>
> Since the idle loop is strictly per-cpu, you can use regular
> sched_clock() here. Something like:
>
> u64 start = sched_clock();
>
> local_irq_enable();
> if (!current_set_polling_and_test()) {
> while (!need_resched()) {
> cpu_relax();
>
> if (sched_clock() - start > POLL_IDLE_TIME_LIMIT)
> break;
> }
> }
> current_clr_polling();

Good idea!

> On x86 we don't have to use that time_check_counter thing, sched_clock()
> is really cheap, not sure if it makes sense on other platforms.

Let's do it the way you suggested, if there are issues with it, we
can still add the counter later.

Thanks!