Re: [PATCHv5] sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
From: Pingfan Liu
Date: Tue Nov 11 2025 - 06:59:13 EST
On Mon, Nov 10, 2025 at 05:08:56PM -0500, Waiman Long wrote:
> On 11/10/25 4:07 PM, Waiman Long wrote:
> > On 11/10/25 6:14 AM, Juri Lelli wrote:
> > > Hi,
> > >
> > > Looks like this has two issues.
> > >
> > > On 10/11/25 09:47, Pingfan Liu wrote:
> > >
> > > ...
> > >
> > > > +/*
> > > > + * This function always returns a non-empty bitmap in @cpus.
> > > > This is because
> > > > + * if a root domain has reserved bandwidth for DL tasks, the DL
> > > > bandwidth
> > > > + * check will prevent CPU hotplug from deactivating all CPUs in
> > > > that domain.
> > > > + */
> > > > +static void dl_get_task_effective_cpus(struct task_struct *p,
> > > > struct cpumask *cpus)
> > > > +{
> > > > + const struct cpumask *hk_msk;
> > > > +
> > > > + hk_msk = housekeeping_cpumask(HK_TYPE_DOMAIN);
> > > > + if (housekeeping_enabled(HK_TYPE_DOMAIN)) {
> > > > + if (!cpumask_intersects(p->cpus_ptr, hk_msk)) {
> > > > + /*
> > > > + * CPUs isolated by isolcpu="domain" always belong to
> > > > + * def_root_domain.
> > > > + */
> > > > + cpumask_andnot(cpus, cpu_active_mask, hk_msk);
> > > > + return;
> > > > + }
> > > > + }
> > > > +
> > > > + /*
> > > > + * If a root domain holds a DL task, it must have active CPUs. So
> > > > + * active CPUs can always be found by walking up the task's cpuset
> > > > + * hierarchy up to the partition root.
> > > > + */
> > > > + cpuset_cpus_allowed(p, cpus);
> > > Grabs callbak_lock spin_lock (sleepable on RT) under pi_lock
> > > raw_spin_lock.
> > I have been thinking about changing callback_lock to a raw_spinlock_t,
> > but need to find a good use case for this change. So it is a solvable
> > problem.
>
Thank you very much for your accommodation.
> Actually, we don't need to acquire the callback_lock if cpuset_mutex is
> held. So another possibility is to create a cpuset_cpus_allowed() variant
> that doesn't acquire the callback_mutex but assert that cpuset_mutex is
> held.
>
The real requirement is a reader protection section starting from
dl_get_task_effective_cpus() to dl_b = &rq->rd->dl_bw;
But there is no handy lock which can spread across
cpuset_cpus_allowed(), I choose the write-lock "cpuset_mutex".
It would be perfect if cpuset_cpus_allowed() had a
cpuset_cpus_allowed_nolock() variant, and if callback_lock could be
changed to a raw_spinlock_t.
But if this is too trivial, I could move dl_get_task_effective_cpus()
outside the pi_lock and re-check task_cs(task) as an alternative.
Best Regards,
Pingfan