Re: [RFC] sched/deadline: only mark active cpu as free

From: Doug Berger
Date: Tue Jan 14 2025 - 18:05:53 EST


On 1/13/2025 2:54 AM, Juri Lelli wrote:
Hi Doug,

On 10/01/25 15:30, Doug Berger wrote:
There is a hazard in the deadline scheduler where an offlined CPU
can have its free_cpus bit left set in the def_root_domain when
the schedutil cpufreq governor is used. This can allow a deadline
thread to be pushed to the runqueue of a powered down CPU which
breaks scheduling.

This commit works around the issue by only setting the free_cpus
bit for a CPU when it is "active". It is likely that the ordering
of sched_set_rq_online() and set_cpu_active() at the end of the
sched_cpu_deactivate() function should be revisited if this
approach has merit.

Signed-off-by: Doug Berger <opendmb@xxxxxxxxx>
---
<snip>
This problem appears to have been initially introduced by commit
120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control") which
moved the set_rq_offline() handling from sched_cpu_dying() to
sched_cpu_deactivate(). The original sequence allowed the free_cpus
bit to be forcibly cleared in the def_root_domains after all of the
scheduler dust settled. The new location makes the
sched_set_rq_offline() essentially meaningless for the deadline
scheduler since the managing of changed scheduling domains happens
later.

There are likely many different approaches to address this issue
and I'm hopeful that somone more familiar with the scheduler than
I can propose a better solution than the one suggested here.

Thank you for reading this far. Any advice is appreciated.

Thanks a lot for the detailed analysis!
Thanks for the review!


I actually fear that the issue is due to the cpudl_clear_freecpu() call
in rq_offline_dl() being racy, as we don't hold cp->lock while calling
that. So, I think your solution below might be almost correct. I am
thinking we should do something similar in cpudl_set() and remove cpudl_
{set,clear}_freecpu() calls altogether.

What do you think? If agree, care to update your patch please? :)
As I noted, I'm not real keen on what I proposed here. In particular I am wary of introducing any extra processing to normal deadline scheduler execution paths.

I am about to submit an alternative that only affects paths when the topology is changing that I hope you will take another look at ;).

I am still very much open to someone saying something like "You know we could probably bring the def_root_domain of the onlining CPU online earlier so that set_rq_offline() gets called for it when attaching to the new scheduling domain." I just don't feel comfortable enough yet with my understanding of the scheduler to know what all of the implications of such a change would be.


Best,
Juri

Thanks again!
Doug

kernel/sched/cpudeadline.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 95baa12a1029..6896bbe0e9ae 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -195,7 +195,8 @@ void cpudl_clear(struct cpudl *cp, int cpu)
cp->elements[cpu].idx = IDX_INVALID;
cpudl_heapify(cp, old_idx);
- cpumask_set_cpu(cpu, cp->free_cpus);
+ if (cpu_active(cpu))
+ cpumask_set_cpu(cpu, cp->free_cpus);
}
raw_spin_unlock_irqrestore(&cp->lock, flags);
}
--
2.34.1