Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO PCI passthrough no longer works

From: Tom Lendacky
Date: Tue Dec 10 2024 - 16:07:09 EST


On 12/10/24 14:33, Simon Pilkington wrote:
> On 10/12/2024 16:47, Sean Christopherson wrote:
>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>
>> ---
>> arch/x86/kvm/svm/svm.c | 12 +++++++++---
>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index dd15cc635655..5144d0283c9d 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>> case MSR_AMD64_DE_CFG: {
>> u64 supported_de_cfg;
>>
>> - if (svm_get_feature_msr(ecx, &supported_de_cfg))
>> + if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>> return 1;
>>
>> - if (data & ~supported_de_cfg)
>> + if (data & ~supported_de_cfg) {
>> + pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>> + supported_de_cfg, data);
>> return 1;
>> + }
>>
>> /*
>> * Don't let the guest change the host-programmed value. The
>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>> * are completely unknown to KVM, and the one bit known to KVM
>> * is simply a reflection of hardware capabilities.
>> */
>> - if (!msr->host_initiated && data != svm->msr_decfg)
>> + if (!msr->host_initiated && data != svm->msr_decfg) {
>> + pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>> + svm->msr_decfg, data);
>> return 1;
>> + }
>>
>> svm->msr_decfg = data;
>> break;
>>
>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>
> Relevant dmesg output with some context below. VM locked up as expected.
>
> [ 85.834971] vfio-pci 0000:0c:00.0: resetting
> [ 85.937573] vfio-pci 0000:0c:00.0: reset done
> [ 86.494210] vfio-pci 0000:0c:00.0: resetting
> [ 86.494264] vfio-pci 0000:0c:00.1: resetting
> [ 86.761442] vfio-pci 0000:0c:00.0: reset done
> [ 86.761480] vfio-pci 0000:0c:00.1: reset done
> [ 86.762392] vfio-pci 0000:0c:00.0: resetting
> [ 86.865462] vfio-pci 0000:0c:00.0: reset done
> [ 86.977360] virbr0: port 1(vnet1) entered learning state
> [ 88.993052] virbr0: port 1(vnet1) entered forwarding state
> [ 88.993057] virbr0: topology change detected, propagating
> [ 103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
> [ 161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down

That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
does change the behavior of LFENCE and isn't just a reflection of the
hardware.

Linux does set that bit on boot, too (if LFENCE always serializing isn't
advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
there.

I imagine that the above CPUID bit isn't set, so an attempt is made to
set the MSR bit.

Thanks,
Tom