Re: [RFD] x86/split_lock: Request to Intel

From: Thomas Gleixner
Date: Thu Oct 17 2019 - 17:31:44 EST


On Thu, 17 Oct 2019, Sean Christopherson wrote:
> On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> > The more I look at this trainwreck, the less interested I am in merging any
> > of this at all.
> >
> > The fact that it took Intel more than a year to figure out that the MSR is
> > per core and not per thread is yet another proof that this industry just
> > works by pure chance.
> >
> > There is a simple way out of this misery:
> >
> > Intel issues a microcode update which does:
> >
> > 1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
> > AND logic, i.e. when one thread disables AC it's automatically
> > disabled on the core.
> >
> > Alternatively it supresses the #AC when the current thread has it
> > disabled.
> >
> > 2) Provide a separate bit which indicates that the AC enable logic is
> > actually AND based or that #AC is supressed when the current thread
> > has it disabled.
> >
> > Which way I don't really care as long as it makes sense.
>
> The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
> one CPU are immediately visible on its sibling CPU.

That's less horrible than I read out of your initial explanation.

Thankfully all of this is meticulously documented in the SDM ...

Though it changes the picture radically. The truly shared MSR allows
regular software synchronization without IPIs and without an insane amount
of corner case handling.

So as you pointed out we need a per core state, which is influenced by:

1) The global enablement switch

2) Host induced #AC

3) Guest induced #AC

A) Guest has #AC handling

B) Guest has no #AC handling

#1:

- OFF: #AC is globally disabled

- ON: #AC is globally enabled

- FORCE: same as ON but #AC is enforced on guests

#2:

If the host triggers an #AC then the #AC has to be force disabled on the
affected core independent of the state of #1. Nothing we can do about
that and once the initial wave of #AC issues is fixed this should not
happen on production systems. That disables #3 even for the #3.A case
for simplicity sake.

#3:

A) Guest has #AC handling

#AC is forwarded to the guest. No further action required aside of
accounting

B) Guest has no #AC handling

If #AC triggers the resulting action depends on the state of #1:

- FORCE: Guest is killed with SIGBUS or whatever the virt crowd
thinks is the appropriate solution

- ON: #AC triggered state is recorded per vCPU and the MSR is
toggled on VMENTER/VMEXIT in software from that point on.

So the only interesting case is #3.B and #1.state == ON. There you need
serialization of the state and the MSR write between the cores, but only
when the vCPU triggered an #AC. Until then, nothing to do.

vmenter()
{
if (vcpu->ac_disable)
this_core_disable_ac();
}

vmexit()
{
if (vcpu->ac_disable) {
this_core_enable_ac();
}

this_core_dis/enable_ac() takes the global state into account and has the
necessary serialization in place.

Thanks,

tglx