Re: [RFC] [PATCH 1/7] User Space Breakpoint Assistance Layer (UBP)

From: Jim Keniston
Date: Mon Jan 18 2010 - 14:50:13 EST

On Mon, 2010-01-18 at 14:34 +0100, Mark Wielaard wrote:
> On Mon, 2010-01-18 at 14:53 +0200, Avi Kivity wrote:
> > On 01/18/2010 02:51 PM, Pekka Enberg wrote:
> > >
> > > And how many probes do we expected to be live at the same time in
> > > real-world scenarios? I guess Avi's "one million" is more than enough?
> > >
> > I don't think a user will ever come close to a million, but we can
> > expect some inflation from inlined functions (I don't know if uprobes
> > replicates such probes, but if it doesn't, it should).
> SystemTap by default places probes on all instances of an inlined
> function. It is still hard to get to a million probes though.
> $ stap -v -l 'process("/usr/bin/emacs").function("*")'
> [...]
> Pass 2: analyzed script: 4359 probe(s)
> You can try probing all statements (for every function, in every file,
> on every line of source code), but even that only adds up to ten
> thousands of probes:
> $ stap -v -l 'process("/usr/bin/emacs").statement("*@*:*")'
> [...]
> Pass 2: analyzed script: 39603 probe(s)
> So a million is pretty far out, even if you add larger programs and all
> the shared libraries they are using.

Thanks, Mark. One correction, below.

> As Srikar said the current allocation technique is the simplest you can
> do, one xol slot for each uprobe. But there are other techniques that
> you can use. Theoretically you only need a xol slot for each thread of a
> process that simultaneously hits a uprobe instance. That requires a bit
> more bookkeeping. The variant of uprobes that systemtap uses at the
> moment does that.

Actually, it's per-probepoint, with a fixed number of slots. If the
probepoint you just hit doesn't have a slot, and none are free, you
steal a slot from another probepoint. Yeah, it's messy.

We considered allocating slots per-thread, hoping to make it basically
lockless, but that way there's more likely to be constant scribbling on
the XOL area, as a thread with n slots cycles through n+m probepoints.
And of course, it gets dicey as the process clones more threads.

I guess the point is, there are a lot of ways to allocate slots, and we
haven't found the perfect algorithm yet, even if you accept the
existence of (and need for) the XOL area. Keep the ideas coming.

> But the locking in that case is pretty tricky, so it
> seemed easier to first get the code with the simplest xol allocation
> technique upstream. But if you do that than you can use a very small xol
> area to support millions of uprobes and only have to expand it when
> there are hundreds of threads in a process all hitting the probes
> simultaneously.
> Cheers,
> Mark


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at