Re: [PATCH] x86/alternatives: remove false sharing in poke_int3_handler()
From: Ingo Molnar
Date: Sun Mar 23 2025 - 17:38:32 EST
* Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> eBPF programs can be run 20,000,000+ times per second on busy servers.
>
> Whenever /proc/sys/kernel/bpf_stats_enabled is turned off,
> hundreds of calls sites are patched from text_poke_bp_batch()
> and we see a critical loss of performance due to false sharing
> on bp_desc.refs lasting up to three seconds.
> @@ -2413,8 +2415,12 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
> /*
> * Remove and wait for refs to be zero.
> */
> - if (!atomic_dec_and_test(&bp_desc.refs))
> - atomic_cond_read_acquire(&bp_desc.refs, !VAL);
> + for_each_possible_cpu(i) {
> + atomic_t *refs = per_cpu_ptr(&bp_refs, i);
> +
> + if (!atomic_dec_and_test(refs))
> + atomic_cond_read_acquire(refs, !VAL);
> + }
So your patch changes text_poke_bp_batch() to busy-spin-wait for
bp_refs to go to zero on all 480 CPUs.
Your measurement is using /proc/sys/kernel/bpf_stats_enabled on a
single CPU, right?
What's the adversarial workload here? Spamming bpf_stats_enabled on all
CPUs in parallel? Or mixing it with some other text_poke_bp_batch()
user if bpf_stats_enabled serializes access?
Does anything undesirable happen in that case?
Thanks,
Ingo