Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

From: Peter Zijlstra
Date: Mon Jun 17 2019 - 10:47:54 EST

On Wed, Jun 12, 2019 at 07:44:12PM +0000, Nadav Amit wrote:

> I have run into similar problems before.
> I had two problematic scenarios. In the first case, I had a âcallâ in the
> middle of the patched code-block, but this call was always followed by a
> âjumpâ to the end of the potentially patched code-block, so I did not have
> the problem.
> In the second case, I had an indirect call (which is shorter than a direct

Longer, 6 bytes vs 5 if I'm not mistaken.

> call) being patched into a direct call. In this case, I preceded the
> indirect call with NOPs so indeed the indirect call was at the end of the
> patched block.
> In certain cases, if a shorter instruction should be potentially patched
> into a longer one, the shorter one can be preceded by some prefixes. If
> there are multiple REX prefixes, for instance, the CPU only uses the last
> one, IIRC. This can allow to avoid synchronize_sched() when patching a
> single instruction into another instruction with a different length.
> Not sure how helpful this information is, but sharing - just in case.

I think we can patch multiple instructions provided:

- all but one instruction are a NOP,
- there are no branch targets inside the range.

By poking INT3 at every instruction in the range and then doing the
machine wide IPI+SYNC, we'll trap every CPU that is in-side the range.

Because all but one instruction are a NOP, we can emulate only the one
instruction (assuming the real instruction is always last), otherwise
NOP when we're behind the real instruction.

Then we can write new instructions, leaving the initial INT3 until last.

Something like this might be useful if we want to support immediate
instructions (like patch_data_* in paravirt_patch.c) for static_call().