Re: [PATCHv5] sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug

From: Pingfan Liu

Date: Tue Nov 11 2025 - 06:48:47 EST


Hi Juri,

Thanks for your review. Please see the comments below.

On Mon, Nov 10, 2025 at 12:14:39PM +0100, Juri Lelli wrote:
> Hi,
>
> Looks like this has two issues.
>
> On 10/11/25 09:47, Pingfan Liu wrote:
>
> ...
>
> > +/*
> > + * This function always returns a non-empty bitmap in @cpus. This is because
> > + * if a root domain has reserved bandwidth for DL tasks, the DL bandwidth
> > + * check will prevent CPU hotplug from deactivating all CPUs in that domain.
> > + */
> > +static void dl_get_task_effective_cpus(struct task_struct *p, struct cpumask *cpus)
> > +{
> > + const struct cpumask *hk_msk;
> > +
> > + hk_msk = housekeeping_cpumask(HK_TYPE_DOMAIN);
> > + if (housekeeping_enabled(HK_TYPE_DOMAIN)) {
> > + if (!cpumask_intersects(p->cpus_ptr, hk_msk)) {
> > + /*
> > + * CPUs isolated by isolcpu="domain" always belong to
> > + * def_root_domain.
> > + */
> > + cpumask_andnot(cpus, cpu_active_mask, hk_msk);
> > + return;
> > + }
> > + }
> > +
> > + /*
> > + * If a root domain holds a DL task, it must have active CPUs. So
> > + * active CPUs can always be found by walking up the task's cpuset
> > + * hierarchy up to the partition root.
> > + */
> > + cpuset_cpus_allowed(p, cpus);
>
> Grabs callbak_lock spin_lock (sleepable on RT) under pi_lock
> raw_spin_lock.
>

Yes, it should be fixed. I'll discuss it in my reply to Waiman's email later.

> > +}
> > +
> > +/* The caller should hold cpuset_mutex */
> > void dl_add_task_root_domain(struct task_struct *p)
> > {
> > struct rq_flags rf;
> > struct rq *rq;
> > struct dl_bw *dl_b;
> > + unsigned int cpu;
> > + struct cpumask msk;
>
> Potentially huge mask allocated on the stack.
>

Since there's no way to handle memory allocation failures, could it be
done by using alloc_cpumask_var() in init_sched_dl_class() to reserve
the memory for this purpose?

Best Regards,

Pingfan
> > raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> > if (!dl_task(p) || dl_entity_is_special(&p->dl)) {
> > @@ -2891,16 +2923,22 @@ void dl_add_task_root_domain(struct task_struct *p)
> > return;
> > }
> >
> > - rq = __task_rq_lock(p, &rf);
> > -
> > + /*
> > + * Get an active rq, whose rq->rd traces the correct root
> > + * domain.
> > + * And the caller should hold cpuset_mutex, which gurantees
> > + * the cpu remaining in the cpuset until rq->rd is fetched.
> > + */
> > + dl_get_task_effective_cpus(p, &msk);
> > + cpu = cpumask_first_and(cpu_active_mask, &msk);
> > + BUG_ON(cpu >= nr_cpu_ids);
> > + rq = cpu_rq(cpu);
> > dl_b = &rq->rd->dl_bw;
> > - raw_spin_lock(&dl_b->lock);
> >
> > + raw_spin_lock(&dl_b->lock);
> > __dl_add(dl_b, p->dl.dl_bw, cpumask_weight(rq->rd->span));
> > -
> > raw_spin_unlock(&dl_b->lock);
> > -
> > - task_rq_unlock(rq, p, &rf);
> > + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
>
> Thanks,
> Juri
>