Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration

From: Mathieu Desnoyers
Date: Tue Nov 14 2017 - 11:48:45 EST


----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

> On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:
>> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> > I've tried to create a small single-threaded self-modifying loop in
>> > user-space to trigger a trace cache or speculative execution quirk,
>> > but I have not succeeded yet. I suspect that I would need to know
>> > more about the internals of the processor architecture to create the
>> > right stalls that would allow speculative execution to move further
>> > ahead, and trigger an incoherent execution flow. Ideas on how to
>> > trigger this would be welcome.
>>
>> I thought the whole problem was per definition multi-threaded.
>>
>> Single-threaded stuff can't get out of sync with itself; you'll always
>> observe your own stores.
>
> And even if you could, you can always execute a local serializing
> instruction like CPUID to force things.

What I'm trying to reproduce is something that breaks in single-threaded
case if I explicitly leave out the CPUID core serializing instruction
when doing code modification on upcoming code, in a loop.

AFAIU, Intel requires a core serializing instruction to be issued even
in single-threaded scenarios between code update and execution, to ensure
that speculative execution does not observe incoherent code. Now the
question we all have for Intel is: is this requirement too strong, or
required by reality ?

Thanks,

Mathieu

>
>> And ISTR the JIT scenario being something like the JIT overwriting
>> previously executed but supposedly no longer used code. And in this
>> scenario you'd want to guarantee all CPUs observe the new code before
>> jumping into it.
>>
>> The current approach is using mprotect(), except that on a number of
>> platforms the TLB invalidate from that is not guaranteed to be strong
>> enough to sync for code changes.
>>
>> On x86 the mprotect() should work just fine, since we broadcast IPIs for
>> the TLB invalidate and the IRET from those will get the things synced up
>> again (if nothing else; very likely we'll have done a MOV-CR3 which will
>> of course also have sufficient syncness on it).
>>
>> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
>> and don't guarantee their TLB invalidate sync against execution units
>> are left broken by this scheme.

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com