Re: linux-next: add utrace tree

From: Peter Zijlstra
Date: Wed Jan 27 2010 - 05:55:57 EST


On Wed, 2010-01-27 at 02:43 -0800, Linus Torvalds wrote:
>
> On Wed, 27 Jan 2010, Peter Zijlstra wrote:
> >
> > Right, so you're going to love uprobes, which does exactly that. The
> > current proposal is overwriting the target instruction with an INT3 and
> > injecting an extra vma into the target process's address space
> > containing the original instruction(s) and possible jumps back to the
> > old code stream.
>
> Just out of interest, how does it handle the threading issue?
>
> Last I saw, at least some CPU people were _very_ nervous about overwriting
> instructions if another CPU might be just about to execute them.
>
> Even the "overwrite only the first byte with 'int3'" made them go "umm, I
> need to talk to some core CPU people to see if that's ok". They mumble
> about possible CPU errata, I$ coherency, instruction retry etc.
>
> I realize kprobes does this very thing, but kprobes is esoteric stuff and
> doesn't have much choice. In user space, you _could_ do the modification
> on a different physical page and then just switch the page table entry
> instead, and not get into the whole D$/I$ coherency thing at all.

Right, so there's two aspects:

1) concurrency when inserting the probe
2) concurrency when hitting the probe

1) used to be dealt with by using utrace to stop all threads in the
process and then writing the instruction. I suggested to CoW the page,
modify the instruction, set the pagetable and flush tlbs at full speed
-- the very thing you suggest here.

2) so traditionally (and the intel arch manual describes this) is to
replace the instruction, single step it, and write the probe back. This
is racy for multi-threading. The current uprobes stuff solves this by
doing single-step-out-of-line (XOL).

XOL injects a new vma into the target process and puts the old
instruction there, then it single steps on the new location, leaving the
original site with INT3.

This doesn't work for things like RIP relative instructions, so uprobes
considers them un-probable.

Also, I myself really object to inserting a vma in a running process,
its like a land-lord, sure he has the key but he won't come in an poke
through your things.

The alternative is to place the instruction in TLS or stack space, since
each thread can only have a single trap at a time, you only need space
for 1 instruction (plus a possible jump out to the original site). There
is the 'problem' of marking the TLS/stack executable when being probed.

Then there is the whole emulation angle, the uprobes people basically
say its too much effort to write a x86 emulator.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/