Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

From: Chen, Yu C

Date: Fri Apr 03 2026 - 01:49:03 EST


On 4/2/2026 7:06 PM, K Prateek Nayak wrote:
Hello Peter,

On 4/2/2026 4:25 PM, Peter Zijlstra wrote:
On Thu, Apr 02, 2026 at 10:11:11AM +0530, K Prateek Nayak wrote:

It is still not super clear to me how the logic deals with more than
128CPUs in a DIE domain because that'll need more than the u64 but
sbm_find_next_bit() simply does:

tmp = leaf->bitmap & mask; /* All are u64 */

expecting just the u64 bitmap to represent all the CPUs in the leaf.

If we have, say 256 CPUs per DIE, we get shift(7) and arch_sbm_mask
as 7f (127) which allows a leaf to more than 64 CPUs but we are
using the "u64 bitmap" directly and not:

find_next_bit(bitmap, arch_sbm_mask)

Am I missing something here?

Nope. That logic just isn't there, that was left as an exercise to the
reader :-)

Ack! Let me go fiddle with that.


Nice catch. I hadn't noticed this since we have fewer than
64 CPUs per die. Please feel free to send patches to me when
they're available.

And regarding your other question about the calculation of arch_sbm_shift,
I'm trying to understand why there is a subtraction of 1, should it be:
- arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN] - 1;
+ arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN - 1];
?
Are we trying to filer the raw global unique die id? - similar to topo_apicid()
which mask the lower x86_topo_system.dom_shifts[dom - 1]).

With above change I can get a correct value of leaves (4) rather than (2) in
the original version.

thanks,
Chenyu