Re: [PATCH v2] x86/mm: Don't disable INVLPG if "incomplete Global INVLPG flushes" is fixed by microcode

From: Andrew Cooper
Date: Thu Apr 04 2024 - 12:48:21 EST


On 04/04/2024 5:18 pm, Sean Christopherson wrote:
> On Mon, Mar 25, 2024, Michael Kelley wrote:
>>> static void setup_pcid(void)
>>> {
>>> + const struct x86_cpu_id *invlpg_miss_match;
>>> +
>>> if (!IS_ENABLED(CONFIG_X86_64))
>>> return;
>>>
>>> if (!boot_cpu_has(X86_FEATURE_PCID))
>>> return;
>>>
>>> - if (x86_match_cpu(invlpg_miss_ids)) {
>>> + invlpg_miss_match = x86_match_cpu(invlpg_miss_ids);
>>> + if (invlpg_miss_match &&
>>> + invlpg_miss_match->driver_data > boot_cpu_data.microcode) {
>>> pr_info("Incomplete global flushes, disabling PCID");
>>> setup_clear_cpu_cap(X86_FEATURE_PCID);
>>> return;
>> As noted in similar places where microcode versions are
>> checked, hypervisors often lie to guests about microcode versions.
>> For example, see comments in bad_spectre_microcode(). I
>> know Hyper-V guests always see the microcode version as
>> 0xFFFFFFFF (max u32 value). So in a Hyper-V guest the above
>> code will always leave PCID enabled.
> Enumerating broken PCID support to a guest is very arguably a hypervisor bug.
> Hypervisors also lie to guest about FMS. As KVM *user* with affected hardware
> (home box), I would want the kernel to assume PCID works if X86_FEATURE_HYPERVISOR
> is present.

I have very mixed feelings about all of this.

The Gracemont INVLPG vs PCID bug was found in the field, so PCID will
have been enumerated to guests at that point.  You can't blindly drop
PCID until the VM next reboots.

A related example.  I wrote the patch to hide XSAVES to work around an
AMD erratum where XSAVEC sufficed, and the consequences were so dire for
some versions of Windows that there was a suggestion to simply revert
the workaround to make VMs run again.  Windows intentionally asserts
sanity (== expectations) in what it can see; I have no idea whether it
would object in this case but hiding PCID is definitely playing with fire.

I am frequently dismayed at how many FMS abuses there are in Linux, and
I'm this >< close to talking a leaf out of HyperV's book and poisoning
FMS to 0 or ~0 just like the microcode revision.   Any use of FMS for
anything other than diagnostic purposes under virt is live migration bug
waiting to happen.

Xen's current TLB algorithms ensure never to have both PCID and PGE
active together, so we managed to dodge this particular mess.  But as a
consequence, we've got no logic to spot it, or to consider changing PCID
visibility.  That said, for better or worse, the ucode revision is
visible (for now), and a guest which polls the revision will even spot
the hypervisor doing a late ucode load.

Sorry I don't have any better suggestions.  The only nice(ish) of fixing
this for the guest kernel is for Intel to allocate a $FOO_NO bit, and
it's horrible in every other way.

~Andrew