Re: Tracing text poke / kernel self-modifying code (Was: Re: [RFC v2 0/6] x86: dynamic indirect branch promotion)

From: Adrian Hunter
Date: Thu Aug 29 2019 - 05:42:12 EST


On 29/08/19 11:53 AM, Peter Zijlstra wrote:
> On Thu, Aug 29, 2019 at 11:23:52AM +0300, Adrian Hunter wrote:
>> On 9/01/19 12:35 PM, Peter Zijlstra wrote:
>>> On Tue, Jan 08, 2019 at 12:47:42PM -0800, Nadav Amit wrote:
>>>
>>>> A general solution is more complicated, however, due to the racy nature of
>>>> cross-modifying code. There would need to be TSC recording of the time
>>>> before the modifications start and after they are done.
>>>>
>>>> BTW: I am not sure that static-keys are much better. Their change also
>>>> affects the control flow, and they do affect the control flow.
>>>
>>> Any text_poke() user is a problem; which is why I suggested a
>>> PERF_RECORD_TEXT_POKE that emits the new instruction. Such records are
>>> timestamped and can be correlated to the trace.
>>>
>>> As to the racy nature of text_poke, yes, this is a wee bit tricky and
>>> might need some care. I _think_ we can make it work, but I'm not 100%
>>> sure on exactly how PT works, but something like:
>>>
>>> - write INT3 byte
>>> - IPI-SYNC
>>>
>>> and ensure the poke_handler preserves the existing control flow (which
>>> it currently does not, but should be possible).
>>>
>>> - emit RECORD_TEXT_POKE with the new instruction
>>>
>>> at this point the actual control flow will be through the INT3 and
>>> handler and not hit the actual instruction, so the actual state is
>>> irrelevant.
>>>
>>> - write instruction tail
>>> - IPI-SYNC
>>> - write first byte
>>> - IPI-SYNC
>>>
>>> And at this point we start using the new instruction, but this is after
>>> the timestamp from the RECORD_TEXT_POKE event and decoding should work
>>> just fine.
>>>
>>
>> Presumably the IPI-SYNC does not guarantee that other CPUs will not already
>> have seen the change. In that case, it is not possible to provide a
>> timestamp before which all CPUs executed the old code, and after which all
>> CPUs execute the new code.
>
> 'the change' is an INT3 poke, so either you see the old code flow, or
> you see an INT3 emulate the old flow in your trace.
>
> That should be unambiguous.
>
> Then you emit the RECORD_TEXT_POKE with the new instruction on. This
> prepares the decoder to accept a new reality.
>
> Then we finish the instruction poke.
>
> And then when the trace no longer shows INT3 exceptions, you know the
> new code is in effect.
>
> How is this ambiguous?

It's not. I didn't get that from the first read, sorry.

Can you expand on "and ensure the poke_handler preserves the existing
control flow"? Whatever the INT3-handler does will be traced normally so
long as it does not itself execute self-modified code.