Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

Next message: Eric Biggers: "[PATCH wireless-next v2 0/6] Consolidate Michael MIC code into cfg80211"
Previous message: John Hubbard: "Re: [PATCH v9 17/31] gpu: nova-core: Hopper/Blackwell: calculate reserved FB heap size"
In reply to: Tim Chen: "Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec-&gt;cpumask to per NUMA node to reduce contention"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: K Prateek Nayak

Date: Tue Apr 07 2026 - 23:07:24 EST

Hello Tim,

On 4/8/2026 2:05 AM, Tim Chen wrote:
>> And regarding your other question about the calculation of arch_sbm_shift,
>> I'm trying to understand why there is a subtraction of 1, should it be:
>> - arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN] - 1;
>> + arch_sbm_shift = x86_topo_system.dom_shifts[TOPO_DIE_DOMAIN - 1];
>
> Perhaps something like
>
> arch_sbm_shift = min(sizeof(unsigned long),
> topology_get_domain_shift(TOPO_TILE_DOMAIN));
>
> to take care of both AMD system and the 64 bit leaf bitmask limit?

Ack! But do we want to separate CPUs on same LLC domain across
different cachelines in 64 CPU chunks or should we use the rest
of the padding to represent them?

I'm collecting some performance numbers to see if makes any
difference under high contention but have you seen benefits of
sharding the mask further when there are hundreds of CPU on the
same LLC?

--
Thanks and Regards,
Prateek