[PATCH] sched: Make the idle timer expire always in hardirq context.

From: Sebastian Andrzej Siewior
Date: Mon Sep 06 2021 - 07:30:39 EST


The intel powerclamp driver will setup a per-CPU worker with RT
priority. The worker will then invoke play_idle() in which it remains in
the idle poll loop until it is stopped by the timer it started earlier.

That timer needs to expire in hardirq context on PREEMPT_RT. Otherwise
the timer will expire in ksoftirqd as a SOFT timer but that task won't
be scheduled on the CPU because its priority is lower than the priority
of the worker which is in the idle loop.

Always expire the idle timer in hardirq context.

Fixes:c1de45ca831ac ("sched/idle: Add support for tasks that inject idle")
Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
---
kernel/sched/idle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 912b47aa99d82..d17b0a5ce6ac3 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -379,10 +379,10 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
cpuidle_use_deepest_state(latency_ns);

it.done = 0;
- hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
it.timer.function = idle_inject_timer_fn;
hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
- HRTIMER_MODE_REL_PINNED);
+ HRTIMER_MODE_REL_PINNED_HARD);

while (!READ_ONCE(it.done))
do_idle();
--
2.33.0