Re: [PATCH 2/2] x86/resctrl: Don't workqueue local event counter reads
From: Fenghua Yu
Date: Mon Nov 04 2024 - 18:59:03 EST
Hi, Tony,
On 11/4/24 14:56, Luck, Tony wrote:
cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU);
To a large degree Peter's is working around inefficiency in this housekeeping
call.
Code may be running on a suitable CPU from the domain cpumask, but this
call will very likely pick the first CPU in that mask, rather than the current one.
Agree.
From that point it's all downhill unless you are lucky enough that the first
CPU is a tick_nohz_full_cpu() one and you take the
smp_call_function_any(cpumask, mon_event_count, rr, 1);
Whenever this function is called, the performance is degraded rather
than improved because extra get_cpu()/put_cpu() are called in the fast
path in the current patch.
On platforms that have less housekeeping CPUs (e.g. a RT platform),
there could be a higher chance that the first CPU is a nohz_full CPU and
run smp_call_function_any().
path. It seems that on many systems you'll take the
smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
path and make a pointless IPI to get the data.
Yes, that's right. But it's not conflicting with my suggested change.
What I suggested is to move the fast path code to this case only. So the
fast path is always checked/called in both cases if condition is met:
1. It's already checked/called inside smp_call_function_any() in
nohz_full case. No need to call out the fast path separately.
- No extra get_cpu() and put_cpu() are called.
- The performance is better than the current patch.
2. It's called out in non nohz_full case. No performance difference from
the current patch.
Thanks.
-Fenghua