Re: [PATCH 00/10] perf/uprobe: Optimize uprobes

From: Peter Zijlstra
Date: Tue Jul 09 2024 - 03:12:14 EST


On Tue, Jul 09, 2024 at 07:56:51AM +0900, Masami Hiramatsu wrote:
> On Mon, 08 Jul 2024 11:12:41 +0200
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > Hi!
> >
> > These patches implement the (S)RCU based proposal to optimize uprobes.
> >
> > On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
> > tight loop:
> >
> > perf probe -x ./uprobes test=func
> > perf stat -ae probe_uprobe:test -- sleep 1
> >
> > perf probe -x ./uprobes test=func%return
> > perf stat -ae probe_uprobe:test__return -- sleep 1
> >
> > PRE:
> >
> > 4,038,804 probe_uprobe:test
> > 2,356,275 probe_uprobe:test__return
> >
> > POST:
> >
> > 7,216,579 probe_uprobe:test
> > 6,744,786 probe_uprobe:test__return
> >
>
> Good results! So this is another series of Andrii's batch register?

Yeah, it is my counter proposal. I didn't much like the refcounting
thing he ended up with, and his own numbers show the refcounting remains
a significant problem.

These patches mostly do away with the refcounting entirely -- except for
the extremely rare case where you let a return probe sit for over a
second without anything else happening.