Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second

From: Paul E. McKenney
Date: Wed Jan 22 2020 - 21:26:54 EST


On Thu, Jan 23, 2020 at 10:33:34AM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 08:54:32 -0800
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:
>
> > On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote:
> > > On Wed, 22 Jan 2020 07:11:15 -0500
> > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > >
> > > > On Wed, 22 Jan 2020 16:23:17 +0900
> > > > Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > > >
> > > > > On Tue, 21 Jan 2020 19:29:05 -0500
> > > > > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> > > > >
> > > > > > On Thu, 16 Jan 2020 23:44:52 +0900
> > > > > > Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > > Since the 5 jiffies delay for the optimizer is too
> > > > > > > short to wait for other probes, make it longer,
> > > > > > > like 1 second.
> > > > > >
> > > > > > Hi Masami,
> > > > > >
> > > > > > Can you explain more *why* 5 jiffies is too short.
> > > > >
> > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration
> > > > > and unregistration like systemtap, which will use array-based interface to
> > > > > register/unregister. In that case, 5 jiffies will be enough for the delay
> > > > > to wait for other kprobe registration/unregsitration.
> > > > >
> > > > > However, since perf and ftrace register/unregister probes one-by-one with
> > > > > RCU synchronization interval, the optimizer will be started before
> > > > > finishing to register/unregister all probes.
> > > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> > > > > Since the kprobe_mutex is also involved in disabling kprobes, this also
> > > > > stops probe-event disabling.
> > > > >
> > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but
> > > > > not enough for dozens of probe events.
> > > > >
> > > >
> > > > Perhaps we should have a mechanism that can detect new probes being
> > > > added, and just continue to delay the optimization, instead of having
> > > > some arbitrary delay.
> > >
> > > Yes, that is what [03/13] does :)
> > > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies.
> > > And in that case, [03/13] still doesn't work. That's why I added this patch
> > > after that.
> >
> > If the RCU synchronization is synchronize_rcu_tasks(), then yes, it
> > will often take way more than 5 jiffies. If it is synchronize_rcu(),
> > 5 jiffies would not be unusual, especially on larger systems.
> > But in the case of synchronize_rcu(), one option is to instead use
> > synchronize_rcu_expedited(). It is not clear that this last is really
> > justified in this case, but figured it might be worth mentioning.
>
> It is synchronize_rcu(), but it seems sometimes it is called several
> times on one probe disabling.

OK, good to know.

> Anyway, without this update, I added a printk to count optimizer
> queue-length and found that all optimizer call with a single kprobe
> on the quenes. I think this depends on the machine, but as far as
> I tested on 8-threads qemu x86, shows this result...
>
> Probes: 256 kprobe_events
> Enable events
> real 0m 0.02s
> user 0m 0.00s
> sys 0m 0.02s
> [ 17.730548] Queue-update: 180, skipped, Total Queued: 180
> [ 17.739445] Queue-update: 1, go, Total Queued: 180
> Disable events
> [ 19.744634] Queue-update: 1, go, Total Queued: 1
> [ 19.770743] Queue-update: 1, go, Total Queued: 1
> [ 19.886625] Queue-update: 1, go, Total Queued: 1
> [ 20.006631] Queue-update: 1, go, Total Queued: 1
> [ 20.126628] Queue-update: 1, go, Total Queued: 1
> [ 20.246620] Queue-update: 1, go, Total Queued: 1
> [ 20.374665] Queue-update: 1, go, Total Queued: 1
> [ 20.486617] Queue-update: 1, go, Total Queued: 1
> [ 20.606608] Queue-update: 1, go, Total Queued: 1
> [ 20.726596] Queue-update: 1, go, Total Queued: 1
> [ 20.846608] Queue-update: 1, go, Total Queued: 1
> [ 20.966723] Queue-update: 1, go, Total Queued: 1
> [ 21.086611] Queue-update: 1, go, Total Queued: 1
> [ 21.206605] Queue-update: 1, go, Total Queued: 1
> [ 21.326603] Queue-update: 1, go, Total Queued: 1
> [ 21.462609] Queue-update: 1, go, Total Queued: 1
> [ 21.566755] Queue-update: 1, go, Total Queued: 1
> [ 21.686606] Queue-update: 1, go, Total Queued: 1
> [ 21.806611] Queue-update: 1, go, Total Queued: 1
> [ 21.926609] Queue-update: 1, go, Total Queued: 1
> [ 22.046621] Queue-update: 1, go, Total Queued: 1
> [ 22.166729] Queue-update: 1, go, Total Queued: 1
> [ 22.302609] Queue-update: 1, go, Total Queued: 1
> [ 22.510627] Queue-update: 1, go, Total Queued: 1
> [ 22.536638] Queue-update: 1, go, Total Queued: 1
> [ 22.654618] Queue-update: 1, go, Total Queued: 1
> [ 22.774643] Queue-update: 1, go, Total Queued: 1
> [ 22.902609] Queue-update: 1, go, Total Queued: 1
> [ 23.022608] Queue-update: 1, go, Total Queued: 1
> [ 23.158606] Queue-update: 1, go, Total Queued: 1
> [ 23.254618] Queue-update: 1, go, Total Queued: 1
> [ 23.374647] Queue-update: 1, go, Total Queued: 1
> [ 23.494605] Queue-update: 1, go, Total Queued: 1
> [ 23.614610] Queue-update: 1, go, Total Queued: 1
> [ 23.734606] Queue-update: 1, go, Total Queued: 1
> [ 23.854609] Queue-update: 1, go, Total Queued: 1
> [ 23.974615] Queue-update: 1, go, Total Queued: 1
> [ 24.094609] Queue-update: 1, go, Total Queued: 1
> [ 24.230607] Queue-update: 1, go, Total Queued: 1
> [ 24.342625] Queue-update: 1, go, Total Queued: 1
> [ 24.475478] Queue-update: 1, go, Total Queued: 1
> [ 24.574554] Queue-update: 1, go, Total Queued: 1
> [ 24.694605] Queue-update: 1, go, Total Queued: 1
> [ 24.814585] Queue-update: 1, go, Total Queued: 1
> [ 24.934591] Queue-update: 1, go, Total Queued: 1
> [ 25.054614] Queue-update: 1, go, Total Queued: 1
> [ 25.174628] Queue-update: 1, go, Total Queued: 1
> [ 25.294637] Queue-update: 1, go, Total Queued: 1
> [ 25.414620] Queue-update: 1, go, Total Queued: 1
> [ 25.534553] Queue-update: 1, go, Total Queued: 1
> [ 25.654610] Queue-update: 1, go, Total Queued: 1
> [ 25.774627] Queue-update: 1, go, Total Queued: 1
> [ 25.894609] Queue-update: 1, go, Total Queued: 1
> [ 26.030548] Queue-update: 1, go, Total Queued: 1
> [ 26.134626] Queue-update: 1, go, Total Queued: 1
> [ 26.254620] Queue-update: 1, go, Total Queued: 1
> [ 26.373603] Queue-update: 1, go, Total Queued: 1
> [ 26.502606] Queue-update: 1, go, Total Queued: 1
> [ 26.614607] Queue-update: 1, go, Total Queued: 1
> [ 26.734610] Queue-update: 1, go, Total Queued: 1
> [ 26.854620] Queue-update: 1, go, Total Queued: 1
> [ 26.974660] Queue-update: 1, go, Total Queued: 1
> [ 27.094620] Queue-update: 1, go, Total Queued: 1
> [ 27.214596] Queue-update: 1, go, Total Queued: 1
> [ 27.334640] Queue-update: 1, go, Total Queued: 1
> [ 27.494606] Queue-update: 1, go, Total Queued: 1
> [ 27.574703] Queue-update: 1, go, Total Queued: 1
> [ 27.694609] Queue-update: 1, go, Total Queued: 1
> [ 27.814607] Queue-update: 1, go, Total Queued: 1
> [ 27.934631] Queue-update: 1, go, Total Queued: 1
> [ 28.054606] Queue-update: 1, go, Total Queued: 1
> [ 28.182685] Queue-update: 1, go, Total Queued: 1
> [ 28.318568] Queue-update: 1, go, Total Queued: 1
> [ 28.422610] Queue-update: 1, go, Total Queued: 1
> [ 28.534621] Queue-update: 1, go, Total Queued: 1
> [ 28.654618] Queue-update: 1, go, Total Queued: 1
> [ 28.774622] Queue-update: 1, go, Total Queued: 1
> [ 28.894609] Queue-update: 1, go, Total Queued: 1
> [ 29.022609] Queue-update: 1, go, Total Queued: 1
> [ 29.150615] Queue-update: 1, go, Total Queued: 1
> [ 29.262610] Queue-update: 1, go, Total Queued: 1
> [ 29.374621] Queue-update: 1, go, Total Queued: 1
> [ 29.494606] Queue-update: 1, go, Total Queued: 1
> [ 29.614616] Queue-update: 1, go, Total Queued: 1
> [ 29.734607] Queue-update: 1, go, Total Queued: 1
> [ 29.854601] Queue-update: 1, go, Total Queued: 1
> [ 29.974610] Queue-update: 1, go, Total Queued: 1
> [ 30.094625] Queue-update: 1, go, Total Queued: 1
> [ 30.214606] Queue-update: 1, go, Total Queued: 1
> [ 30.334602] Queue-update: 1, go, Total Queued: 1
> [ 30.454634] Queue-update: 1, go, Total Queued: 1
> [ 30.574606] Queue-update: 1, go, Total Queued: 1
> [ 30.694589] Queue-update: 1, go, Total Queued: 1
> [ 30.814613] Queue-update: 1, go, Total Queued: 1
> [ 30.934602] Queue-update: 1, go, Total Queued: 1
> [ 31.054605] Queue-update: 1, go, Total Queued: 1
> [ 31.182596] Queue-update: 1, go, Total Queued: 1
> [ 31.318621] Queue-update: 1, go, Total Queued: 1
> [ 31.414615] Queue-update: 1, go, Total Queued: 1
> [ 31.534610] Queue-update: 1, go, Total Queued: 1
> [ 31.670608] Queue-update: 1, go, Total Queued: 1
> [ 31.774626] Queue-update: 1, go, Total Queued: 1
> [ 31.894607] Queue-update: 1, go, Total Queued: 1
> [ 32.014609] Queue-update: 1, go, Total Queued: 1
> [ 32.134607] Queue-update: 1, go, Total Queued: 1
> [ 32.254611] Queue-update: 1, go, Total Queued: 1
> [ 32.374608] Queue-update: 1, go, Total Queued: 1
> [ 32.494619] Queue-update: 1, go, Total Queued: 1
> [ 32.614607] Queue-update: 1, go, Total Queued: 1
> [ 32.734612] Queue-update: 1, go, Total Queued: 1
> [ 32.862616] Queue-update: 1, go, Total Queued: 1
> [ 32.974620] Queue-update: 1, go, Total Queued: 1
> [ 33.110609] Queue-update: 1, go, Total Queued: 1
> [ 33.214609] Queue-update: 1, go, Total Queued: 1
> [ 33.342611] Queue-update: 1, go, Total Queued: 1
> [ 33.454607] Queue-update: 1, go, Total Queued: 1
> [ 33.574611] Queue-update: 1, go, Total Queued: 1
> [ 33.694619] Queue-update: 1, go, Total Queued: 1
> [ 33.814607] Queue-update: 1, go, Total Queued: 1
> [ 33.950614] Queue-update: 1, go, Total Queued: 1
> [ 34.062609] Queue-update: 1, go, Total Queued: 1
> [ 34.174609] Queue-update: 1, go, Total Queued: 1
> [ 34.294619] Queue-update: 1, go, Total Queued: 1
> [ 34.430533] Queue-update: 1, go, Total Queued: 1
> [ 34.534594] Queue-update: 1, go, Total Queued: 1
> [ 34.654605] Queue-update: 1, go, Total Queued: 1
> [ 34.790596] Queue-update: 1, go, Total Queued: 1
> [ 34.902611] Queue-update: 1, go, Total Queued: 1
> [ 35.014630] Queue-update: 1, go, Total Queued: 1
> [ 35.134634] Queue-update: 1, go, Total Queued: 1
> [ 35.262608] Queue-update: 1, go, Total Queued: 1
> [ 35.374634] Queue-update: 1, go, Total Queued: 1
> [ 35.494617] Queue-update: 1, go, Total Queued: 1
> [ 35.622608] Queue-update: 1, go, Total Queued: 1
> [ 35.742610] Queue-update: 1, go, Total Queued: 1
> [ 35.854572] Queue-update: 1, go, Total Queued: 1
> [ 35.982596] Queue-update: 1, go, Total Queued: 1
> [ 36.094603] Queue-update: 1, go, Total Queued: 1
> [ 36.222612] Queue-update: 1, go, Total Queued: 1
> [ 36.334610] Queue-update: 1, go, Total Queued: 1
> [ 36.454619] Queue-update: 1, go, Total Queued: 1
> [ 36.574619] Queue-update: 1, go, Total Queued: 1
> [ 36.694643] Queue-update: 1, go, Total Queued: 1
> [ 36.814614] Queue-update: 1, go, Total Queued: 1
> [ 36.934610] Queue-update: 1, go, Total Queued: 1
> [ 37.062609] Queue-update: 1, go, Total Queued: 1
> [ 37.198611] Queue-update: 1, go, Total Queued: 1
> [ 37.294618] Queue-update: 1, go, Total Queued: 1
> [ 37.414618] Queue-update: 1, go, Total Queued: 1
> [ 37.534595] Queue-update: 1, go, Total Queued: 1
> [ 37.662594] Queue-update: 1, go, Total Queued: 1
> [ 37.774610] Queue-update: 1, go, Total Queued: 1
> [ 37.894618] Queue-update: 1, go, Total Queued: 1
> [ 38.014619] Queue-update: 1, go, Total Queued: 1
> [ 38.142612] Queue-update: 1, go, Total Queued: 1
> [ 38.254609] Queue-update: 1, go, Total Queued: 1
> [ 38.374619] Queue-update: 1, go, Total Queued: 1
> [ 38.502481] Queue-update: 1, go, Total Queued: 1
> [ 38.614596] Queue-update: 1, go, Total Queued: 1
> [ 38.734597] Queue-update: 1, go, Total Queued: 1
> [ 38.854606] Queue-update: 1, go, Total Queued: 1
> [ 38.974620] Queue-update: 1, go, Total Queued: 1
> [ 39.094617] Queue-update: 1, go, Total Queued: 1
> [ 39.222597] Queue-update: 1, go, Total Queued: 1
> [ 39.334610] Queue-update: 1, go, Total Queued: 1
> [ 39.454609] Queue-update: 1, go, Total Queued: 1
> [ 39.574633] Queue-update: 1, go, Total Queued: 1
> [ 39.694611] Queue-update: 1, go, Total Queued: 1
> [ 39.814608] Queue-update: 1, go, Total Queued: 1
> [ 39.934610] Queue-update: 1, go, Total Queued: 1
> [ 40.054621] Queue-update: 1, go, Total Queued: 1
> [ 40.174611] Queue-update: 1, go, Total Queued: 1
> [ 40.297471] Queue-update: 1, go, Total Queued: 1
> [ 40.414504] Queue-update: 1, go, Total Queued: 1
> [ 40.534601] Queue-update: 1, go, Total Queued: 1
> [ 40.654611] Queue-update: 1, go, Total Queued: 1
> [ 40.774609] Queue-update: 1, go, Total Queued: 1
> [ 40.894628] Queue-update: 1, go, Total Queued: 1
> [ 41.014608] Queue-update: 1, go, Total Queued: 1
> [ 41.135594] Queue-update: 1, go, Total Queued: 1
> real 0m 21.40s
> user 0m 0.00s
> sys 0m 0.04s

So 21.4s/256 = 84 milliseconds per event disable, correct?

It might be worth trying synchronize_rcu_expedited() just as an experiment
to see if it speeds things up significantly.

Thanx, Paul

> Remove events
> real 0m 2.14s
> user 0m 0.00s
> sys 0m 0.04s
>
>
> Thank you,
>
>
> --
> Masami Hiramatsu <mhiramat@xxxxxxxxxx>