Re: linux-next: add utrace tree

From: Ingo Molnar
Date: Wed Jan 27 2010 - 03:55:35 EST

* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> On Tue, 2010-01-26 at 15:37 -0800, Linus Torvalds wrote:
> >
> > On Tue, 26 Jan 2010, Tom Tromey wrote:
> > >
> > > In non-stop mode (where you can stop one thread but leave the others
> > > running), gdb wants to have the breakpoints always inserted. So,
> > > something must emulate the displaced instruction.
> >
> > I'm almost totally uninterested in breakpoints that actually re-write
> > instructions. It's impossible to do that efficiently and well, especially
> > in threaded environments.
> >
> > So if you do instruction rewriting, I can only say "that's your problem".
> Right, so you're going to love uprobes, which does exactly that. The current
> proposal is overwriting the target instruction with an INT3 and injecting an
> extra vma into the target process's address space containing the original
> instruction(s) and possible jumps back to the old code stream.
> I'm all in favor of not doing that extra vma and instead use stack or TLS
> space, but then people complain about having to make that executable (which
> is something I don't really mind, x86 had executable everything for very
> long, and also, its only so when debugging the thing anyway).

I think the best solution for user probes (by far) is to use a simplified
in-kernel instruction emulator for the few common probes instruction. (Kprobes
already partially decodes x86 instructions to make it safe to apply
accelerated probes and there's other decoding logic in the kernel too.)

The design and practical advantages are numerous:

- People want to probe their function prologues most of the time ...
a single INT3 there will in most cases just hit the initial stack
allocation and that's it. We could get quite good coverage (and very fast
emulation) for the common case in not too much code - and much of that code
we already have available. No re-trapping, no extra instruction patching
and complex maintenance of trampolines.

- It's as transparent as it gets - no user-space trampoline or other visible
state that modifies behavior or can be stomped upon by user-space bugs.

- Lightweight and simple probe insertion: no weird setup sequence needing the
stopping of all tasks to install the trampoline. We just add the INT3 and
off you go.

- Emulation is evidently thread-safe, SMP-safe, etc. as it only acts on
task local state.

- The points we can probe are never truly limited as it's all freely
upscalable: if you cannot probe an instruction you want to probe today,
extend the emulator. Deny the rest. _All_ versions of uprobes code i've
seen so far already restricts the probe-compatible instruction set:
RIP-relative instructions are excluded on 64-bit for example.

- Emulation has the _least_ semantical side effects as we really execute
'that' instruction - not some other instruction put elsewhere into a
special vma or into the process/thread stack, or some special in-kernel
trampoline, etc.

- Emulation can be very fast for the common case as well. Nobody will probe
weird, complex instructions. They will use 'perf probe' to insert probes
into their functions 90% of the time ...

- FPU and complex ops and pagefault emulation is not really what i'd expect
to be necessary for simple probing - but it _can_ be added by people who
care about it, if they so wish.

Such a scheme would be _far_ more preferable form a maintenance POV as well,
as the initial code will be small, and we can extend it gradually. All the
other proposals are complex 'all or nothing' schemes with no flexibility for
complexity at all.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at