Re: [RFC PATCH 2/6] jump label v3 - x86: Introduce generic jump patchingwithout stop_machine

From: Masami Hiramatsu
Date: Fri Nov 20 2009 - 19:06:59 EST


Hi Peter,

H. Peter Anvin wrote:
On 11/18/2009 02:43 PM, Jason Baron wrote:
Add text_poke_fixup() which takes a fixup address to where a processor
jumps if it hits the modifying address while code modifying.
text_poke_fixup() does following steps for this purpose.

1. Setup int3 handler for fixup.
2. Put a breakpoint (int3) on the first byte of modifying region,
and synchronize code on all CPUs.
3. Modify other bytes of modifying region, and synchronize code on all CPUs.
4. Modify the first byte of modifying region, and synchronize code
on all CPUs.
5. Clear int3 handler.

Thus, if some other processor execute modifying address when step2 to step4,
it will be jumped to fixup code.

This still has many limitations for modifying multi-instructions at once.
However, it is enough for 'a 5 bytes nop replacing with a jump' patching,
because;
- Replaced instruction is just one instruction, which is executed atomically.
- Replacing instruction is a jump, so we can set fixup address where the jump
goes to.


I just had a thought about this... regardless of if this is safe or not
(which still remains to be determined)... I have a bit more of a
fundamental question about it:

This code ends up taking *two* global IPIs for each instruction
modification. Each of those requires whole-system synchronization.

As Mathieu and I talked, first IPI is for synchronizing code, and
second is for waiting for all int3 handling is done.

How
is this better than taking one IPI and having the other CPUs wait until
the modification is complete before returning?

Would you mean using stop_machine()? :-)

If we don't care about NMI, we can use stop_machine() (for
this reason, kprobe-jump-optimization can use stop_machine(),
because kprobes can't probe NMI code), but tracepoint has
to support NMI.

Actually, it might be possible, even it will be complicated.
If one-byte modifying(int3 injection/removing) is always
synchronized, I assume below timechart can work
(and it can support NMI/SMI too).

----
<CPU0> <CPU1>
flag = 0
setup int3 handler
int3 injection[sync]
other-bytes modifying
smp_call_function(func) func()
wait_until(flag==1) irq_disable()
sync_core() for other-bytes modifying
flag = 1
first-byte modifying[sync] wait_until(flag==2)
flag = 2
wait_until(flag==3) irq_enable()
flag = 3
cleanup int3 handler return
return
----

I'm not so sure that this flag-based step-by-step code can
work faster than 2 IPIs :-(

Any comments are welcome! :-)

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: mhiramat@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/