Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

Next message: Alice Ryhl: "[PATCH v4 0/2] Change Rust Binder crate name to rust_binder"
Previous message: Alice Ryhl: "[PATCH v4 1/2] rust: support overriding crate_name"
In reply to: K Prateek Nayak: "Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec-&gt;cpumask to per NUMA node to reduce contention"
Next in thread: K Prateek Nayak: "Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec-&gt;cpumask to per NUMA node to reduce contention"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Peter Zijlstra

Date: Thu Apr 02 2026 - 06:56:15 EST

On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:

> It is still not super clear to me how the logic deals with more than
> 128CPUs in a DIE domain because that'll need more than the u64 but
> sbm_find_next_bit() simply does:
>
> tmp = leaf->bitmap & mask; /* All are u64 */
>
> expecting just the u64 bitmap to represent all the CPUs in the leaf.
>
> If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_mask
> as 7f (127) which allows a leaf to more than 64 CPUs but we are
> using the "u64 bitmap" directly and not:
>
> find_next_bit(bitmap, arch_sbm_mask)
>
> Am I missing something here?

Nope. That logic just isn't there, that was left as an exercise to the
reader :-)

For AMD in particular it would be good to have one leaf per CCD, but
since CCD are not enumerated in your topology (they really should be), I
didn't do that.

Now, I seem to remember we had this discussion in the past some time,
and you had some hacks available.

Anyway, the whole premise was to have one leaf/cacheline per cache, such
that high frequency atomic ops set/clear bit, don't bounce the line
around.

I took the nohz bitmap, because it was relatively simple and is known to
suffer from contention under certain workloads.