Re: [PATCH 0/4] jump label patches
From: Mathieu Desnoyers
Date: Tue Oct 06 2009 - 10:32:20 EST
* Masami Hiramatsu (mhiramat@xxxxxxxxxx) wrote:
> Roland McGrath wrote:
> > I think text_poke_fixup() is a good safe place to start, and it seems wise
> > to merge a version using that before worrying anything subtler. But it's
> > almost surely overkill and makes the enable/disable switching cost pretty
> > huge. The rules as documented by Intel seem to indicate that simple
> > self-modification can work for UP and for SMP there should be some scheme
> > with IPIs that is not too terrible.
> >
> > Those can entail a multi-phase modification like the int3 patching style,
> > but int3 is not the only way to do it. int3 has the benefit of being a
> > one-byte instruction you can patch in, but also the downside of requiring
> > the trap handling hair.
>
> Hmm, would you want to put tracepoint on the path of int3 handling?
>
> > Another approach is:
> >
> > start:
> > .balign 2
> > 2: nopl
> > 7: ...
> >
> > phase 1:
> > 2: jmp 7
> > 4: <last 3 bytes of nopl>
> > 7: ...
> >
> > phase 2:
> > 2: jmp 7
> > 4: {last 3 bytes of "jmp .Ldo_trace"}
> > 7: ...
> >
> > phase 3:
> > 2: jmp .Ldo_trace
> > 7: ...
> >
> > A scheme like that requires that the instruction to be patched be 2-byte
> > aligned so that the two-byte "jmp .+3" can be an atomic store not
> > straddling a word boundary. On x86-64 (and, according to the Intel book,
> > everything >= Pentium), you can atomically store 8 bytes when aligned. So
> > there you will usually actually be able to do this in one or two phases to
> > cover each particular 5 byte range with adequately aligned stores.
>
> It is unclear whether we can atomically modify 2 bytes in icache (also, it
> can across cache lines or pages.)
> I think int3 bypassing is more generic way to patching if you don't mind
> tracing int3 path :-)
I think their point is that by only changing the 2nd byte of a 2-byte
jmp instruction, leaving the opcode as-is, they think the processor will
either choose one or the other branch, seeing two "coherent"
instructions.
However, as I point out in the immediate values thread, the problem
might run deeper, which would be that CPU need to see consistent
instructions across multiple reads in the same code region (due to
pipeline effects of some sorts). Mere atomicity of the modification does
not seem to be enough. What I gathered by discussing with Richard J
Moore is that we really need a synchronizing instruction between the
moments CPUs see the old and new version. int3/iret is one. Sending an
IPI doing a mfence or cpuid is another. Both seem needed to ensure
SMP-safe code modification.
See the immediate values thread for details. We are waiting for Intel
ruling on the int3+IPI scheme.
Mathieu
>
>
> Thank you,
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America), Inc.
> Software Solutions Division
>
> e-mail: mhiramat@xxxxxxxxxx
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/