Re: Official documentation from Intel stating that poking INT3 (single-byte) concurrently is OK ?

From: Mathieu Desnoyers
Date: Wed Feb 22 2023 - 11:41:23 EST


On 2023-02-22 04:20, Peter Zijlstra wrote:
On Tue, Feb 21, 2023 at 01:42:58PM -0500, Mathieu Desnoyers wrote:
On 2023-02-21 12:50, Steven Rostedt wrote:
On Tue, 21 Feb 2023 11:44:42 -0500
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

Hi Peter,

I have emails from you dating from a few years back unofficially stating
that it's OK to update the first byte of an instruction with a single-byte
int3 concurrently:

https://lkml.indiana.edu/hypermail/linux/kernel/1001.1/01530.html

It is referred in the original implementation of text_poke_bp():
commit fd4363fff3d9 ("x86: Introduce int3 (breakpoint)-based instruction patching")

Olivier Dion is working on the libpatch [1,2] project aiming to use this
property for low-latency/low-overhead live code patching in user-space as
well, but we cannot find an official statement from Intel that guarantees
this breakpoint-bypass technique is indeed OK without stopping the world
while patching.

Do you know where I could find an official statement of this guarantee ?


The fact that we have been using it for over 10 years without issue should
be a good guarantee ;-)

I know you probably prefer an official statement, and I thought they
actually gave one, but can't seem to find it.

I recall an in-person discussion with Peter Anvin shortly after he got the
official confirmation, but I cannot find any public trace of it. I suspect
Intel may have documented this internally only.

My 2ct, ISTR this also having been vetted by AMD, perhaps they did
manage to write it down somewhere.

Good point! I did not find a statement specifically about the breakpoint bypass, but by piecing up together the explanations from their manual, I think we can conclude that it is safe:

Based on AMD64 Architecture Programmer’s Manual Volume 2
7.6.1 Cache Organization and Operation
Cross-Modifying Code

The subsection "Asynchronous modification" describes in detail what happens if we concurrently update an instruction that is concurrently executed. The good news is that there is no mention of an evil Boeman triggering any kind of general protection fault when updating instructions concurrently with their execution. So inserting a single-byte breakpoint as first byte of an instruction is just the simplest scenario covered by that section:

"Such modifications must be done via a single store to the target thread's instruction stream that is contained entirely within a naturally-aligned quadword, and is subject to the constraints given here. A key aspect is that, although the store is performed atomically, the affected quadword may be read more than once in the process of extracting instruction bytes from it. This can result in the following scenarios resulting from a single store:

[...]

2. A modification to one instruction A that changes it to two instructions A'-B will only result in execution of A'-B.

[...]"

Then there is the "Synchronous modification" section which basically describes how serializing instructions can be issued before proceeding to execute the modified instructions.

So AFAIU the XMC breakpoint insertion without stopping the world is covered by AMD's "Asynchronous modification" section, and the rest of the breakpoint-bypass technique using serializing instructions relying on IPIs in the kernel, and on membarrier sync-core in userspace, is guaranteed by the "Synchronous modification" section.

Unfortunately I cannot find anything with respect to asynchronous cross-modification of code stated as clearly in Intel's documentation.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com