Re: [RFD] x86/split_lock: Request to Intel

From: Xiaoyao Li
Date: Fri Oct 18 2019 - 06:20:52 EST


On 10/18/2019 5:02 PM, Thomas Gleixner wrote:
On Fri, 18 Oct 2019, Xiaoyao Li wrote:
On 10/17/2019 8:29 PM, Thomas Gleixner wrote:
The more I look at this trainwreck, the less interested I am in merging any
of this at all.

The fact that it took Intel more than a year to figure out that the MSR is
per core and not per thread is yet another proof that this industry just
works by pure chance.


Whether it's per-core or per-thread doesn't affect much how we implement for
host/native.

How useful.

OK. IIUC. We can agree on the use model of native like below:

We enable #AC on all cores/threads to detect split lock.
-If user space causes #AC, sending SIGBUS to it.
-If kernel causes #AC, we globally disable #AC on all cores/threads, letting kernel go on working and WARN. (only disabling #AC on the thread generates it just doesn't help, since the buggy kernel code is possible to run on any threads and thus disabling #AC on all of them)

As described above, either enabled globally or disabled globally, so whether it's per-core or per-thread really doesn't matter

And also, no matter it's per-core or per-thread, we always can do something in
VIRT.

It matters a lot. If it would be per thread then we would not have this
discussion at all.

Indeed, it's the fact that the control MSR bit is per-core to cause this discussion. But the per-core scope only makes this feature difficult or impossible to be virtualized.

We could make the decision to not expose it to guest to avoid the really bad thing. However, even we don't expose this feature to guest and don't virtualize it, the below problem always here.

If you think it's not a problem and acceptable to add an option to let KVM disable host's #AC detection, we can just make it this way. And then we can design the virtualizaion part without any change to native design at all.

Maybe what matters is below.

Seriously, this makes only sense when it's by default enabled and not
rendered useless by VIRT. Otherwise we never get any reports and none of
the issues are going to be fixed.


For VIRT, it doesn't want old guest to be killed due to #AC. But for native,
it doesn't want VIRT to disable the #AC detection

I think it's just about the default behavior that whether to disable the
host's #AC detection or kill the guest (SIGBUS or something else) once there
is an split-lock #AC in guest.

So we can provide CONFIG option to set the default behavior and module
parameter to let KVM set/change the default behavior.

Care to read through the whole discussion and figure out WHY it's not that
simple?

Thanks,

tglx