Re: [PATCH] uprobes: switch to RCU Tasks Trace flavor for better performance

From: Andrii Nakryiko
Date: Fri Sep 13 2024 - 17:37:18 EST


On Tue, Sep 10, 2024 at 10:43 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
>
> This patch switches uprobes SRCU usage to RCU Tasks Trace flavor, which
> is optimized for more lightweight and quick readers (at the expense of
> slower writers, which for uprobes is a fine tradeof) and has better
> performance and scalability with number of CPUs.
>
> Similarly to baseline vs SRCU, we've benchmarked SRCU-based
> implementation vs RCU Tasks Trace implementation.
>
> SRCU
> ====
> uprobe-nop ( 1 cpus): 3.276 ± 0.005M/s ( 3.276M/s/cpu)
> uprobe-nop ( 2 cpus): 4.125 ± 0.002M/s ( 2.063M/s/cpu)
> uprobe-nop ( 4 cpus): 7.713 ± 0.002M/s ( 1.928M/s/cpu)
> uprobe-nop ( 8 cpus): 8.097 ± 0.006M/s ( 1.012M/s/cpu)
> uprobe-nop (16 cpus): 6.501 ± 0.056M/s ( 0.406M/s/cpu)
> uprobe-nop (32 cpus): 4.398 ± 0.084M/s ( 0.137M/s/cpu)
> uprobe-nop (64 cpus): 6.452 ± 0.000M/s ( 0.101M/s/cpu)
>
> uretprobe-nop ( 1 cpus): 2.055 ± 0.001M/s ( 2.055M/s/cpu)
> uretprobe-nop ( 2 cpus): 2.677 ± 0.000M/s ( 1.339M/s/cpu)
> uretprobe-nop ( 4 cpus): 4.561 ± 0.003M/s ( 1.140M/s/cpu)
> uretprobe-nop ( 8 cpus): 5.291 ± 0.002M/s ( 0.661M/s/cpu)
> uretprobe-nop (16 cpus): 5.065 ± 0.019M/s ( 0.317M/s/cpu)
> uretprobe-nop (32 cpus): 3.622 ± 0.003M/s ( 0.113M/s/cpu)
> uretprobe-nop (64 cpus): 3.723 ± 0.002M/s ( 0.058M/s/cpu)
>
> RCU Tasks Trace
> ===============
> uprobe-nop ( 1 cpus): 3.396 ± 0.002M/s ( 3.396M/s/cpu)
> uprobe-nop ( 2 cpus): 4.271 ± 0.006M/s ( 2.135M/s/cpu)
> uprobe-nop ( 4 cpus): 8.499 ± 0.015M/s ( 2.125M/s/cpu)
> uprobe-nop ( 8 cpus): 10.355 ± 0.028M/s ( 1.294M/s/cpu)
> uprobe-nop (16 cpus): 7.615 ± 0.099M/s ( 0.476M/s/cpu)
> uprobe-nop (32 cpus): 4.430 ± 0.007M/s ( 0.138M/s/cpu)
> uprobe-nop (64 cpus): 6.887 ± 0.020M/s ( 0.108M/s/cpu)
>
> uretprobe-nop ( 1 cpus): 2.174 ± 0.001M/s ( 2.174M/s/cpu)
> uretprobe-nop ( 2 cpus): 2.853 ± 0.001M/s ( 1.426M/s/cpu)
> uretprobe-nop ( 4 cpus): 4.913 ± 0.002M/s ( 1.228M/s/cpu)
> uretprobe-nop ( 8 cpus): 5.883 ± 0.002M/s ( 0.735M/s/cpu)
> uretprobe-nop (16 cpus): 5.147 ± 0.001M/s ( 0.322M/s/cpu)
> uretprobe-nop (32 cpus): 3.738 ± 0.008M/s ( 0.117M/s/cpu)
> uretprobe-nop (64 cpus): 4.397 ± 0.002M/s ( 0.069M/s/cpu)
>
> Peak throughput for uprobes increases from 8 mln/s to 10.3 mln/s
> (+28%!), and for uretprobes from 5.3 mln/s to 5.8 mln/s (+11%), as we
> have more work to do on uretprobes side.
>
> Even single-thread (no contention) performance is slightly better: 3.276
> mln/s to 3.396 mln/s (+3.5%) for uprobes, and 2.055 mln/s to 2.174 mln/s
> (+5.8%) for uretprobes.
>
> We also select TASKS_TRACE_RCU for UPROBES in Kconfig due to the new
> dependency.
>
> Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> ---
> arch/Kconfig | 1 +
> kernel/events/uprobes.c | 38 ++++++++++++++++----------------------
> 2 files changed, 17 insertions(+), 22 deletions(-)
>

Just in case this slipped through the cracks (and is not just waiting
its turn to be applied), ping. It would be nice to have this patch
with the rest of uprobe patches from the original patch set to go in
together. Thanks!

> diff --git a/arch/Kconfig b/arch/Kconfig
> index 975dd22a2dbd..a0df3f3dc484 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -126,6 +126,7 @@ config KPROBES_ON_FTRACE
> config UPROBES
> def_bool n
> depends on ARCH_SUPPORTS_UPROBES
> + select TASKS_TRACE_RCU
> help
> Uprobes is the user-space counterpart to kprobes: they
> enable instrumentation applications (such as 'perf probe')

[...]