Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

From: Nadav Amit
Date: Mon Jun 17 2019 - 13:12:06 EST

> On Jun 17, 2019, at 7:42 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, Jun 12, 2019 at 07:44:12PM +0000, Nadav Amit wrote:
>> I have run into similar problems before.
>> I had two problematic scenarios. In the first case, I had a âcallâ in the
>> middle of the patched code-block, but this call was always followed by a
>> âjumpâ to the end of the potentially patched code-block, so I did not have
>> the problem.
>> In the second case, I had an indirect call (which is shorter than a direct
> Longer, 6 bytes vs 5 if I'm not mistaken.

Shorter (2-3 bytes IIRC), since the target was held in a register.

>> call) being patched into a direct call. In this case, I preceded the
>> indirect call with NOPs so indeed the indirect call was at the end of the
>> patched block.
>> In certain cases, if a shorter instruction should be potentially patched
>> into a longer one, the shorter one can be preceded by some prefixes. If
>> there are multiple REX prefixes, for instance, the CPU only uses the last
>> one, IIRC. This can allow to avoid synchronize_sched() when patching a
>> single instruction into another instruction with a different length.
>> Not sure how helpful this information is, but sharing - just in case.
> I think we can patch multiple instructions provided:
> - all but one instruction are a NOP,
> - there are no branch targets inside the range.
> By poking INT3 at every instruction in the range and then doing the
> machine wide IPI+SYNC, we'll trap every CPU that is in-side the range.
> Because all but one instruction are a NOP, we can emulate only the one
> instruction (assuming the real instruction is always last), otherwise
> NOP when we're behind the real instruction.
> Then we can write new instructions, leaving the initial INT3 until last.
> Something like this might be useful if we want to support immediate
> instructions (like patch_data_* in paravirt_patch.c) for static_call().

I donât know what you regard when you say SYNC, but if you regard something
like sync_core() (in contrast to something like synchronize_sched() ), I am
not sure it is sufficient.

Using IPI+sync_core(), I think, would make an assumption that IRQs are never
enabled inside IRQ and exception handlers, or that these handlers would not
be invoked while the patched code is executed. Otherwise, the IPI might be
received inside the IRQ/exception handler, and then return from the handler
will be into the middle of a patched instruction.