[PATCH v2 00/11] perf/uprobe: Optimize uprobes

From: Peter Zijlstra
Date: Thu Jul 11 2024 - 07:08:11 EST


Hi!

These patches implement the (S)RCU based proposal to optimize uprobes.

On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
tight loop:

perf probe -x ./uprobes test=func
perf stat -ae probe_uprobe:test -- sleep 1

perf probe -x ./uprobes test=func%return
perf stat -ae probe_uprobe:test__return -- sleep 1

PRE:

4,038,804 probe_uprobe:test
2,356,275 probe_uprobe:test__return

POST:

7,216,579 probe_uprobe:test
6,744,786 probe_uprobe:test__return

(copy-paste FTW, I didn't do new numbers because the fast paths didn't change --
and quick test run shows similar numbers)

Patches also available here:

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes


Changes since last time:
- better split with intermediate inc_not_zero()
- fix UPROBE_HANDLER_REMOVE
- restored the lost rcu_assign_pointer()
- avoid lockdep for uretprobe_srcu
- add missing put_uprobe() -> srcu_read_unlock() conversion
- actually initialize return_instance::has_ref
- a few comments
- things I don't remember