Re: [PATCH v2 00/11] perf/uprobe: Optimize uprobes

From: Andrii Nakryiko
Date: Fri Jul 12 2024 - 00:58:06 EST


On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> Hi!
>
> These patches implement the (S)RCU based proposal to optimize uprobes.
>
> On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
> tight loop:
>
> perf probe -x ./uprobes test=func
> perf stat -ae probe_uprobe:test -- sleep 1
>
> perf probe -x ./uprobes test=func%return
> perf stat -ae probe_uprobe:test__return -- sleep 1
>
> PRE:
>
> 4,038,804 probe_uprobe:test
> 2,356,275 probe_uprobe:test__return
>
> POST:
>
> 7,216,579 probe_uprobe:test
> 6,744,786 probe_uprobe:test__return
>
> (copy-paste FTW, I didn't do new numbers because the fast paths didn't change --
> and quick test run shows similar numbers)
>
> Patches also available here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes
>
>
> Changes since last time:
> - better split with intermediate inc_not_zero()
> - fix UPROBE_HANDLER_REMOVE
> - restored the lost rcu_assign_pointer()
> - avoid lockdep for uretprobe_srcu
> - add missing put_uprobe() -> srcu_read_unlock() conversion
> - actually initialize return_instance::has_ref
> - a few comments
> - things I don't remember
>
>

Hey Peter!

Thanks for the v2, I plan to look at it more thoroughly tomorrow. But
meanwhile I spent a good chunk of today to write an uprobes
stress-test, so we can validate that we are not regressing anything
(yes, I don't trust lockless code and people in general ;)

Anyways, if you'd like to use it, it's at [0]. All you should need to
build and run it is:

$ cd examples/c
$ make -j$(nproc) uprobe-stress
$ sudo ./uprobe-stress -tN -aM -mP -fR


N, M, P, R are number of threads dedicated to one of four functions of
the stress test: triggering user space functions (N),
attaching/detaching various random subsets of uprobes (M), mmap()ing
parts of executable with uprobes (P), and forking the process and
triggering uprobes for a little bit (R). The idea is to test various
timings and interleavings of uprobe-related logic.

You should only need not-too-old Clang to build everything (Clang 12+
should work, I believe). But do let me know if you run into troubles.

I did run this stress test for a little while on current
bpf-next/master with no issues detected (yay!).

But then I also ran it on Linux built from perf/uprobes branch (these
patches), and after a few seconds I see that there is no more
attachment/detachment happening. Eventually I got splats, which you
can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
to run it inside my QEMU image.

So there is still something off, hopefully this will help to debug and
hammer out any remaining kinks. Thanks!

[0] https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3
[1] https://gist.github.com/anakryiko/f761690addf7aa5f08caec95fda9ef1a