Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration

From: Peter Zijlstra
Date: Tue Nov 14 2017 - 11:06:31 EST


On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:
> I've tried to create a small single-threaded self-modifying loop in
> user-space to trigger a trace cache or speculative execution quirk,
> but I have not succeeded yet. I suspect that I would need to know
> more about the internals of the processor architecture to create the
> right stalls that would allow speculative execution to move further
> ahead, and trigger an incoherent execution flow. Ideas on how to
> trigger this would be welcome.

I thought the whole problem was per definition multi-threaded.

Single-threaded stuff can't get out of sync with itself; you'll always
observe your own stores.

And ISTR the JIT scenario being something like the JIT overwriting
previously executed but supposedly no longer used code. And in this
scenario you'd want to guarantee all CPUs observe the new code before
jumping into it.

The current approach is using mprotect(), except that on a number of
platforms the TLB invalidate from that is not guaranteed to be strong
enough to sync for code changes.

On x86 the mprotect() should work just fine, since we broadcast IPIs for
the TLB invalidate and the IRET from those will get the things synced up
again (if nothing else; very likely we'll have done a MOV-CR3 which will
of course also have sufficient syncness on it).

But PowerPC, s390, ARM et al that do TLB invalidates without interrupts
and don't guarantee their TLB invalidate sync against execution units
are left broken by this scheme.