Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration

From: Andy Lutomirski
Date: Tue Nov 14 2017 - 11:11:14 EST


On Tue, Nov 14, 2017 at 8:05 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
>> I've tried to create a small single-threaded self-modifying loop in
>> user-space to trigger a trace cache or speculative execution quirk,
>> but I have not succeeded yet. I suspect that I would need to know
>> more about the internals of the processor architecture to create the
>> right stalls that would allow speculative execution to move further
>> ahead, and trigger an incoherent execution flow. Ideas on how to
>> trigger this would be welcome.
>
> I thought the whole problem was per definition multi-threaded.
>
> Single-threaded stuff can't get out of sync with itself; you'll always
> observe your own stores.
>
> And ISTR the JIT scenario being something like the JIT overwriting
> previously executed but supposedly no longer used code. And in this
> scenario you'd want to guarantee all CPUs observe the new code before
> jumping into it.
>
> The current approach is using mprotect(), except that on a number of
> platforms the TLB invalidate from that is not guaranteed to be strong
> enough to sync for code changes.
>
> On x86 the mprotect() should work just fine, since we broadcast IPIs for
> the TLB invalidate and the IRET from those will get the things synced up
> again (if nothing else; very likely we'll have done a MOV-CR3 which will
> of course also have sufficient syncness on it).
>
> But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
> and don't guarantee their TLB invalidate sync against execution units
> are left broken by this scheme.
>

On x86 single-thread, you can still get in trouble, I think. Do a
store, get migrated, execute the stored code. There's no actual
guarantee that the new CPU does a CR3 load due to laziness.