Re: [PATCH v9 07/22] KVM: VMX: Initialize VMCS FRED fields

From: Sean Christopherson

Date: Thu Mar 05 2026 - 10:26:42 EST


On Wed, Mar 04, 2026, Xin Li wrote:
>
>
> > On Mar 4, 2026, at 8:23 AM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 21, 2026, Binbin Wu wrote:
> >> On 10/27/2025 4:18 AM, Xin Li (Intel) wrote:
> >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>> index fcfa99160018..c8b5359123bf 100644
> >>> --- a/arch/x86/kvm/vmx/vmx.c
> >>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>> @@ -1459,6 +1459,15 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu)
> >>> (unsigned long)(cpu_entry_stack(cpu) + 1));
> >>> }
> >>>
> >>> + /* Per-CPU FRED MSRs */
> >
> > Meh, this is pretty self-explanatory code.
>
> I want to remind that this is “Per-CPU”.
>
> On the other side, FRED_CONFIG and FRED_STKLVLS are typically the same on
> all CPUs, and they don’t need to be updated during vCPU migration.
>
> But anyway vCPU migration only cares per-CPU MSRs, so this is redundant.
>
> It often bothers me whether to explain the code a bit more or not, with a
> short brief comment or a lengthy one.

Oh, I'm all for comments, But I generally dislike terse comments like the above,
as they're only useful for readers that *already* know many of the gory details,
and for everyone else, it's largely just noise.

E.g. in this case, what the subtlety of what you were trying to convey with
"Per-CPU" was completely lost on me. It would have been a wee bit more helpful
in earlier versions that used RDMSR, but even then it required the reader to make
large leaps of intuition to understand the full context.

And the terse comment is also "wrong" in the sense that it lies by omission,
because this isn't *all* of the per-CPU MSRs.

E.g.

/*
* Update the FRED RSP MSRs values that are restored on VM-Exit, as
* Linux uses dedicated per-CPU stacks for things like #DBs and NMIs.
* Note, RSP0 is _only_ used to load RSP on user=>kernel transitions,
* i.e. doesn't need to be loaded on VM-Exit and so doesn't have a VMCS
* field. Note #2, the SSP MSRs will likely be per-CPU as well, but
* Linux doesn't yet support supervisor shadow stacks.
*/

> >>> + if (kvm_cpu_cap_has(X86_FEATURE_FRED)) {
> >>> +#ifdef CONFIG_X86_64
> >>
> >> Nit:
> >>
> >> Is this needed?
> >>
> >> FRED is initialized by X86_64_F(), if CONFIG_X86_64 is not enabled, this
> >> path is not reachable.
> >> There should be no compilation issue without #ifdef CONFIG_X86_64 / #endif.
> >>
> >> There are several similar patterns in this patch, using #ifdef CONFIG_X86_64 /
> >> #endif or not seems not consistent. E.g. __vmx_vcpu_reset() and init_vmcs()
> >> doesn't check the config, but here does.
> >>
> >>> + vmcs_write64(HOST_IA32_FRED_RSP1, __this_cpu_ist_top_va(ESTACK_DB));
> >>> + vmcs_write64(HOST_IA32_FRED_RSP2, __this_cpu_ist_top_va(ESTACK_NMI));
> >>> + vmcs_write64(HOST_IA32_FRED_RSP3, __this_cpu_ist_top_va(ESTACK_DF));
> >
> > IMO, this is flawed for other reasons. KVM shouldn't be relying on kernel
> > implementation details with respect to what FRED stack handles what event.
> >
> > The simplest approach would be to read the actual MSR. _If_ using a per-CPU read
> > provides meaningful performance benefits over RDMSR (or RDMSR w/ immediate? I
> > don't see an API for that...), then have the kernel provide a dedicated accessor.
>
> I think you asked for it:
>
> https://lore.kernel.org/kvm/ZmoWB_XtA0wR2K4Q@xxxxxxxxxx/

Gah, so I did. FWIW, comments like that aren't intended to be "you must do it
this way", i.e. it's ok to push back (if there's justification to do so), but I
can totally see how it came across that way.

> I assume fetching through per-CPU cache is fast, but I might have misunderstood
> your suggestion.

Oh, per-CPU cache is likely faster than RDMSR (though it would be nice to verify).
The part I specifically don't like is referencing DB, NMI, and DF stacks.

> > Then the accessor can be a non-inlined functions, and this code can be e.g.:
> >
> > if (IS_ENABLED(CONFIG_X86_64) && kvm_cpu_cap_has(X86_FEATURE_FRED)) {
> > vmcs_write64(HOST_IA32_FRED_RSP1, fred_rsp(MSR_IA32_FRED_RSP1));
> > vmcs_write64(HOST_IA32_FRED_RSP2, fred_rsp(MSR_IA32_FRED_RSP2));
> > vmcs_write64(HOST_IA32_FRED_RSP3, fred_rsp(MSR_IA32_FRED_RSP2));
^
|
3 (guess who failed at copy+paste)
> > }
> >
> > where fred_rsp() is _declared_ unconditionally, but implemented only for 64-bit.
> > That way the compiler will be happy, and the actual usage will be dropped before
> > linking via dead-code elimination.
>
> If KVM can’t rely on kernel side implementation details, fred_rsp() has to read
> directly from the corresponding MSRs, right?

No. If the kernel provides an API specifically for getting RSP values, then KVM
isn't rely on implementation details, KVM is relying on an (exported?) API, and
it's the _kernel's_ responsibility to ensure that API is updated as needed.

Yeah, yeah, KVM is also part of the kernel, but it's a _lot_ easier to forget to
update code when the implementation details change when some of the usage is "far"
away.

> > Actually, we can probably do one better?
> >
> > if (cpu_feature_enabled(X86_FEATURE_FRED) && kvm_cpu_cap_has(X86_FEATURE_FRED)) {
>
> I think KVM now forces X86_FEATURE_FRED=y, no?

32-bit doesn't :-D

But even for 64-bit, X86_FEATURE_FRED should be cleared when FRED isn't supported
by hardware, in which case cpu_feature_enabled() still patches out the text.