[RFC] Repeated rto_push_irq_work_func() invocation.
From: Sebastian Andrzej Siewior
Date: Wed Oct 02 2024 - 07:21:41 EST
I have this in my RT queue:
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2193,8 +2193,11 @@ static int rto_next_cpu(struct root_doma
rd->rto_cpu = cpu;
- if (cpu < nr_cpu_ids)
+ if (cpu < nr_cpu_ids) {
+ if (!has_pushable_tasks(cpu_rq(cpu)))
+ continue;
return cpu;
+ }
rd->rto_cpu = -1;
This avoided a large number of IPIs to queue and invoke rto_push_work
while a RT task was scheduled. This improved with commit
612f769edd06a ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask")
Now, looking at this again I still see invocations which are skipped due
this patch on an idle CPU more often than on a busy CPU. Given that the
task is removed from the list and the mask is cleaned almost immediately
this looks like a small window which is probably neglectable.
One thing I am not sure what to do about it (from a busy trace):
| ksoftirqd/5-63 [005] dN.31 4446.750055: sched_waking: comm=rcu_preempt pid=17 prio=98 target_cpu=005
| ksoftirqd/5-63 [005] dN.41 4446.750058: enqueue_pushable_task: Add rcu_preempt-17
| ksoftirqd/5-63 [005] dN.41 4446.750059: enqueue_pushable_task: Set 5
Since the enqueued task is not yet on the CPU it gets added to the
pushable list (the task_current() check could be removed since an
enqueued task can never be on CPU, right?). Give the priorities, the new
task will preempt the current task.
| ksoftirqd/5-63 [005] dN.41 4446.750060: sched_wakeup: comm=rcu_preempt pid=17 prio=98 target_cpu=005
| ksoftirqd/5-63 [005] dN.31 4446.750062: sched_stat_runtime: comm=ksoftirqd/5 pid=63 runtime=14625 [ns]
| cyclictest-5192 [003] d..2. 4446.750062: sched_stat_runtime: comm=cyclictest pid=5192 runtime=13066 [ns]
| cyclictest-5192 [003] d..2. 4446.750064: dequeue_pushable_task: Del cyclictest-5192
| cyclictest-5192 [003] d..3. 4446.750065: rto_next_cpu.constprop.0: Look count 1
| cyclictest-5192 [003] d..3. 4446.750066: rto_next_cpu.constprop.0: Leave CPU 5
This is then observed by other CPUs in the system so rto_next_cpu()
returns CPU 5, resulting in a schedule of rto_push_work to CPU5.
| ksoftirqd/5-63 [005] dNh1. 4446.750069: push_rt_task: Start
| ksoftirqd/5-63 [005] dNh1. 4446.750070: push_rt_task: Push rcu_preempt-17 98
| ksoftirqd/5-63 [005] dNh1. 4446.750071: push_rt_task: resched
push_rt_task() didn't do anything because Need-resched is already set.
| ksoftirqd/5-63 [005] dNh1. 4446.750071: rto_next_cpu.constprop.0: Look count 1
| ksoftirqd/5-63 [005] dNh1. 4446.750072: rto_next_cpu.constprop.0: Leave CPU 5
but scheduled rto_push_work again.
| ksoftirqd/5-63 [005] dNh2. 4446.750074: push_rt_task: Start
| ksoftirqd/5-63 [005] dNh2. 4446.750074: push_rt_task: Push rcu_preempt-17 98
| ksoftirqd/5-63 [005] dNh2. 4446.750075: push_rt_task: resched
came to the same conclusion.
| ksoftirqd/5-63 [005] dNh2. 4446.750075: rto_next_cpu.constprop.0: Look count 1
| ksoftirqd/5-63 [005] dNh2. 4446.750076: rto_next_cpu.constprop.0: Leave no CPU count: 1
It left with no CPU because it wrapped around. Nothing scheduled.
| ksoftirqd/5-63 [005] dNh3. 4446.750077: sched_waking: comm=irq_work/5 pid=60 prio=98 target_cpu=005
| cyclictest-5216 [027] d..2. 4446.750077: dequeue_pushable_task: Del cyclictest-5216
| cyclictest-5216 [027] d..3. 4446.750079: rto_next_cpu.constprop.0: Look count 1
| ksoftirqd/5-63 [005] dNh4. 4446.750079: sched_wakeup: comm=irq_work/5 pid=60 prio=98 target_cpu=005
| cyclictest-5216 [027] d..3. 4446.750080: rto_next_cpu.constprop.0: Leave CPU 5
CPU5 is making progress in terms of scheduling but then CPU27 noticed
the mask and scheduled another rto_push_work.
| ksoftirqd/5-63 [005] dN.2. 4446.750084: dequeue_pushable_task: Del rcu_preempt-17
| ksoftirqd/5-63 [005] dN.2. 4446.750085: dequeue_pushable_task: Clear 5
| ksoftirqd/5-63 [005] d..2. 4446.750086: sched_switch: prev_comm=ksoftirqd/5 prev_pid=63 prev_prio=120 prev_state=R+ ==> next_comm=rcu_preempt next_pid=17 next_prio=98
| rcu_preempt-17 [005] d.h21 4446.750089: rto_next_cpu.constprop.0: Look count 0
| rcu_preempt-17 [005] d.h21 4446.750089: rto_next_cpu.constprop.0: Leave no CPU count: 0
This rto_next_cpu() was triggered earlier by CPU27.
At this point I'm not sure if there is something that could be done
about it or if it is a special case.
Would it make sense to avoid scheduling rto_push_work if rq->curr has
NEED_RESCHED set and make the scheduler do push_rt_task()?
Sebastian