Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec->cpumask to per NUMA node to reduce contention

Next message: Linus Probert: "[PATCH v2 2/2] staging: rtl8723bs: remove unused functions"
Previous message: Manivannan Sadhasivam: "Re: [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier()"
In reply to: Peter Zijlstra: "Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec-&gt;cpumask to per NUMA node to reduce contention"
Next in thread: K Prateek Nayak: "Re: [PATCH v2 4/4] sched/rt: Split cpupri_vec-&gt;cpumask to per NUMA node to reduce contention"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Chen, Yu C

Date: Tue Mar 31 2026 - 01:38:27 EST

On 3/24/2026 8:00 PM, Peter Zijlstra wrote:

On Mon, Mar 23, 2026 at 11:45:01AM -0700, Tim Chen wrote:

On Fri, 2026-03-20 at 13:40 +0100, Peter Zijlstra wrote:

On Mon, Jul 21, 2025 at 02:10:26PM +0800, Pan Deng wrote:

This change splits `cpupri_vec->cpumask` into per-NUMA-node data to
mitigate false sharing.

So I really do think we need something here. We're running into the
whole cpumask contention thing on a semi regular basis.

[ ... ]

+
+unsigned int sbm_find_next_bit(struct sbm *sbm, int start)
+{
+ struct sbm_leaf *leaf = (void *)sbm;
+ struct sbm_root *root = (void *)sbm;
+ int nr = start >> arch_sbm_shift;
+ int bit = start & arch_sbm_mask;
+ unsigned long tmp, mask = (~0UL) << bit;
+ if (sbm->type == st_root) {
+ for (; nr < arch_sbm_leafs; nr++, mask = ~0UL) {
+ leaf = root->leafs[nr];
+ tmp = leaf->bitmap & mask;
+ if (!tmp)
+ continue;

I suppose this should be
if (tmp)
break;
otherwise
[ 40.071616] watchdog: BUG: soft lockup - CPU#0 stuck for 30s! [swapper/0:0]
[ 40.071616] Modules linked in:
[ 40.071616] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-rc5-sbm-+ #16 PREEMPT(full)
[ 40.071616] RIP: 0010:sbm_find_next_bit+0x2a/0xa0

+ }
+ } else {
+ tmp = leaf->bitmap & mask;
+ }
+ if (!tmp)
+ return -1;
+ return (nr << arch_sbm_shift) | __ffs(tmp);
+}

update of the test:
With above change, I did a simple hackbench test on
a system with multiple LLCs within 1 node, so the benefit
is significant(+12%~+30%) when system is under-loaded, while
some regression when overloaded(-10%)(need to figure out)

thanks,
Chenyu