Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration

From: Linus Torvalds
Date: Mon Nov 13 2017 - 12:14:49 EST


On Mon, Nov 13, 2017 at 8:56 AM, Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> I figured out what you're pointing to: if exec() is executed by a previously
> running thread, and there is no core serializing instruction between program
> load and return to user-space, the kernel ends up acting like a JIT, indeed.

Well, exec() is actually the least of our problems, because it will
have caused the virtual m apping to be set up too.

But we have had cases that haven't had that basically forever. Your
example of user-space doing an _unintentional_ cross-modification is
just such a case, but so is anybody doing their own code management in
user space by just reading their own executable into memory etc.

So part of the problem is that it's perfectly valid to generate code
and then just jump to it in x86 space as long as you stay on the same
CPU. And there has never been any guarantee that that you wouldn't be
migrated in between.

In _practice_, I suspect that migration events are much much too big
for this to be an issue at all. And the trigger for migration is going
to be something like a timer interrupt that causes us to reschedule in
the first place - which ends up serializing due to the iret. And even
if the rescheduling is done by one CPU just doing a "schedule()", us
doing a re-balancing of CPU's, and another CPU then picking up the
process, there's been tens of thousands of instructions, several
spinlocks, lots of cross-CPU synchronization etc going on.

I do not believe for a second that the CPU prefetching queue will be
active over those kinds of ranges and events.

So I don't really think the problem can actually occur in the first
place. I think the SDK rules are garbage.

But that's exactly why I'd actually really want to get some more real
rules from Intel and AMD. Because I think your patch is pointless, and
doesn't really fix anything in reality, but it's triggered by reading
the Intel SDK and going "in theory, this means that we would need to
do XYZ".

And when theory and practice do not match, I think (a) the theory is
bad, and (b) reality trumps theory.

In this case (b) means that I'm not super-eager to apply the patch,
and (a) means that since the theory is based on the Intel SDK, I think
we should consider the Intel SDK to be a problem, and ask for
clarification of just what the rules really are.

Linus