Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

From: Borislav Petkov
Date: Tue Feb 23 2021 - 04:44:07 EST


On Tue, Feb 23, 2021 at 10:27:55AM +0800, Aili Yao wrote:
> When Guest access one address with UE error, it will exit guest mode,
> the host will do the recovery job, and then one SIGBUS is send to
> the VCPU and qemu will catch the signal, there is only address and
> error level no RIPV in signal, so qemu will assume RIPV is cleared and
> inject the error into guest OS.

Lemme see:

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)

/* If we get an action required MCE, it has been injected by KVM
* while the VM was running. An action optional MCE instead should
* be coming from the main thread, which qemu_init_sigbus identifies
* as the "early kill" thread.
*/
assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);

...

kvm_mce_inject(cpu, paddr, code);

in that function:

if (code == BUS_MCEERR_AR) {
status |= MCI_STATUS_AR | 0x134;
mcg_status |= MCG_STATUS_EIPV;
} else {
status |= 0xc0;
mcg_status |= MCG_STATUS_RIPV;
}

That looks like a valid RIP bit to me. Then cpu_x86_inject_mce() gets
that mcg_status and injects it into the guest.

So I can't follow your claim - qemu does handle RIPV just fine, it
seems.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette