Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

From: Juri Lelli
Date: Wed Feb 19 2025 - 05:06:29 EST


On 19/02/25 10:29, Dietmar Eggemann wrote:

...

> I did now.

Thanks!

> Patch-wise I have:
>
> (1) Putting 'fair_server's __dl_server_[de|at]tach_root() under if
> '(cpumask_test_cpu(rq->cpu, [old_rd->online|cpu_active_mask))' in
> rq_attach_root()
>
> https://lkml.kernel.org/r/Z7RhNmLpOb7SLImW@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> (2) Create __dl_server_detach_root() and call it in rq_attach_root()
>
> https://lkml.kernel.org/r/Z4fd_6M2vhSMSR0i@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> plus debug patch:
>
> https://lkml.kernel.org/r/Z6M5fQB9P1_bDF7A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> plus additional debug.

So you don't have the one with which we ignore special tasks while
rebuilding domains?

https://lore.kernel.org/all/Z6spnwykg6YSXBX_@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

Could you please double check again against

git@xxxxxxxxxx:jlelli/linux.git experimental/dl-debug

> The suspend issue still persists.
>
> My hunch is that it's rather an issue with having 0 CPUs left in DEF
> while deactivating the last isol CPU (CPU3) so we set overflow = 1 w/o
> calling __dl_overflow(). We want to account fair_server_bw=52428
> against 0 CPUs.
>
> l B B l l l
>
> ^^^
> isolcpus=[3,4]
>
>
> cpumask_and(mask, rd->span, cpu_active_mask)
>
> mask = [3-5] & [0-3] = [3] -> dl_bw_cpus(3) = 1
>
> ---
>
> dl_bw_deactivate() called cpu=5
>
> dl_bw_deactivate() called cpu=4
>
> dl_bw_deactivate() called cpu=3
>
> dl_bw_cpus() cpu=6 rd->span=3-5 cpu_active_mask=0-3 cpus=1 type=DEF
> ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
> cpumask_subset(rd->span, cpu_active_mask) is false
>
> for_each_cpu_and(i, rd->span, cpu_active_mask)
> cpus++ <-- cpus is 1 !!!
>
> dl_bw_manage: cpu=3 cap=0 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=1 type=DEF span=3-5
^^^^^^
This still looks wrong: with a single cpu remaining we should only have
the corresponding dl server bandwidth present (unless there is some
other DL task running.

If you already had the patch ignoring sugovs bandwidth in your set, could
you please share the full dmesg?

Thanks!