Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention
From: Peter Zijlstra
Date: Thu Apr 02 2026 - 06:56:15 EST
On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:
> It is still not super clear to me how the logic deals with more than
> 128CPUs in a DIE domain because that'll need more than the u64 but
> sbm_find_next_bit() simply does:
>
> tmp = leaf->bitmap & mask; /* All are u64 */
>
> expecting just the u64 bitmap to represent all the CPUs in the leaf.
>
> If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_mask
> as 7f (127) which allows a leaf to more than 64 CPUs but we are
> using the "u64 bitmap" directly and not:
>
> find_next_bit(bitmap, arch_sbm_mask)
>
> Am I missing something here?
Nope. That logic just isn't there, that was left as an exercise to the
reader :-)
For AMD in particular it would be good to have one leaf per CCD, but
since CCD are not enumerated in your topology (they really should be), I
didn't do that.
Now, I seem to remember we had this discussion in the past some time,
and you had some hacks available.
Anyway, the whole premise was to have one leaf/cacheline per cache, such
that high frequency atomic ops set/clear bit, don't bounce the line
around.
I took the nohz bitmap, because it was relatively simple and is known to
suffer from contention under certain workloads.