Re: linux-next: add utrace tree

From: Jim Keniston
Date: Thu Jan 28 2010 - 19:59:42 EST

On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote:
> * Jim Keniston <jkenisto@xxxxxxxxxx> wrote:
> > On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
> > ...
> >
> > Yes, emulating "push %ebp" would buy us a lot of coverage for a lot of apps
> > on x86 (but see below**). [...]
> > [...] Even there, though, we'd have to address the page fault we'd
> > occasionally get when extending the stack vma.
> Nope, in the simplest model not even page fault emulation is needed,
> get_user()/put_user() would resolve it automatically. If you either get the
> value with the pagefault resolved, or you get a -EFAULT.

get_user()/put_user() have to be done in a context where you can sleep,
right? Uprobes currently operates in such contexts, but there's some
talk of moving it all to a DIE_INT3 notifier context, where it can't


> > > We could get quite good coverage (and very fast
> > > emulation) for the common case in not too much code - and much of that code
> > > we already have available. No re-trapping,
> >
> > As previously discussed, boosting would also get rid of the single-step trap
> > for most instructions.
> Boosting is not in the uprobes patch-set you submitted. Even with it present
> it wont get rid of the initial INT3. So basically _best-case_ (with boosting)
> XOL-uprobes could roughly break even with a pure emulator approach ...
> That's a big and fundamental difference.

To be fair, wrt uprobes, emulation and boosting are both in the same
state: pretty well understood, but not yet implemented.

> > >
> > > - It's as transparent as it gets - no user-space trampoline or other visible
> > > state that modifies behavior or can be stomped upon by user-space bugs.
> >
> > The XOL vma isn't writable from user space, so I can't think of how it could
> > be clobbered merely by a stray memory reference. [...]
> Well there must be some purpose to the instrumentation, there must be some way
> to save data, right? If yes and it's in user-space, that data is clobberable.

One or two others have advocated an approach (which eliminates the
breakpoint trap) where trace data is stored in the uprobe vma, but I
haven't. (In such a case, "XOL vma" would be a misnomer.) I agree that
in such a scenario, the uprobe vma would of necessity be writable by the

> If it's in kernel-space then we have to enter the kernel anyway (with similar
> cost patterns to an INT3 entry) - so we just delayed the kernel entry.

This seems to presume that you have to extract trace data from the
kernel every time a probe is hit. In actual practice, you're often just
checking for unusual arg values, incrementing a counter, or some such.

> > Even if we add emulation, it seems sensible to keep the XOL approach as a
> > backup to handle instructions that aren't yet emulated (and architectures
> > that don't yet have emulators). That way, if you don't probe any unemulated
> > instructions, the XOL vma is never created.
> To turn the argument around: an in-kernel emulator is an all-around facility
> to make sure we probe safely and securely, _and_ it is also more portable
> because it's simpler (because more gradual) to implement on a new architecture
> as you dont actually have to copy around instructions (and make sure they work
> in that new place), but have to emulate a limited subset of the instruction
> space, on purely local state.

I understand the desire to start small and simple and grow gradually
from there. We thought we were doing that. Single-stepping out of line
has been in use for close to a decade, maybe more; and boosting (in
kprobes) has been around for a few years as well. To the *probes folks,
it feels pretty solid.

> With an emulator (assuming the emulator is correct) we can execute the precise
> semantics of that instruction in that place - without any side-effects from
> trampolining/replacement.

And of course, our view has been that the best way to achieve the effect
of the instruction, including all desired side-effects, is to execute
the instruction on the CPU.

> >
> > **In practice, we've had to probe all sorts of instructions, including FP
> > instructions -- especially where you want to exploit the debug info to get
> > the names, types, and locations of variables and args. For some compilers
> > and architectures, the debug info isn't reliable until the end of the
> > function prologue, at which point you could find any old instruction. Ditto
> > if you want to probe statements within a function.
> For those cases, frankly, the right approach is to fix the debug info (or
> introduce a new one) and forget the old crap.
> You treat debuginfo as some god-given property, while it's one of the suckiest
> aspects of all of Linux. But we've had that discussion months (and years) ago.
> It has improved in gcc 4.5 so there's some hope.

Yes, there seems to be considerable movement toward better debug info --
which could make statement probing (and not just function-boundary
probing) more and more feasible.

> Ingo


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at