[PATCH v2 00/11] perf/uprobe: Optimize uprobes
From: Peter Zijlstra
Date: Thu Jul 11 2024 - 07:08:11 EST
Hi!
These patches implement the (S)RCU based proposal to optimize uprobes.
On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
tight loop:
perf probe -x ./uprobes test=func
perf stat -ae probe_uprobe:test -- sleep 1
perf probe -x ./uprobes test=func%return
perf stat -ae probe_uprobe:test__return -- sleep 1
PRE:
4,038,804 probe_uprobe:test
2,356,275 probe_uprobe:test__return
POST:
7,216,579 probe_uprobe:test
6,744,786 probe_uprobe:test__return
(copy-paste FTW, I didn't do new numbers because the fast paths didn't change --
and quick test run shows similar numbers)
Patches also available here:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes
Changes since last time:
- better split with intermediate inc_not_zero()
- fix UPROBE_HANDLER_REMOVE
- restored the lost rcu_assign_pointer()
- avoid lockdep for uretprobe_srcu
- add missing put_uprobe() -> srcu_read_unlock() conversion
- actually initialize return_instance::has_ref
- a few comments
- things I don't remember