RE: Re: [PATCH] sched/fair: Optimize CPU iteration using for_each_cpu_and[not]

From: Li,Rongqing
Date: Thu Aug 21 2025 - 08:21:02 EST




> On 15/08/25 09:15, lirongqing wrote:
> > From: Li RongQing <lirongqing@xxxxxxxxx>
> >
> > Replace open-coded CPU iteration patterns with more efficient
> > for_each_cpu_and() and for_each_cpu_andnot() macros in three locations.
> >
> > This change both simplifies the code and provides minor performance
> > improvements by using the more specialized iteration macros.
> >
>
> TBF I'm not sure it does improve anything for the SMT cases considering we
> don't see much more than SMT8.
>

I did the blow simple test on 128 cpu, smt 2 machine, and result shows for_each_cpu_andnot is better :

for_each_cpu + if() vs for_each_cpu_andnot()
5026373 vs 3398283
4034229 vs 2711302



#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/cpumask.h>
#include <linux/sched/clock.h>

static int test_init(void)
{

int cpu, sibling;
int i = 0;
int loop = 1000;
u64 now;

now = local_clock();

while (loop--) {
for (cpu = 0; cpu < 128; cpu++) {
for_each_cpu(sibling, cpu_smt_mask(cpu)) {
if (cpu == sibling)
continue;
i++;
}
}
}
printk("%lld %d", local_clock() - now);

i =0;
loop = 1000;

now = local_clock();
while (loop--) {
for (cpu = 0; cpu < 128; cpu++) {
for_each_cpu_andnot(sibling, cpu_smt_mask(cpu), cpumask_of(cpu)) {
i++;
}
}
}

printk("%lld %d", local_clock() - now);


return -1;
}

module_init(livepatch_init);
MODULE_LICENSE("GPL");
MODULE_INFO(livepatch, "Y");

Thanks

-Li





> The task_numa_find_cpu() one I do agree makes things better.
>
> > Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
>
> Reviewed-by: Valentin Schneider <vschneid@xxxxxxxxxx>