Re: [PATCH v3 06/27] x86/cea: Export per CPU variable cea_exception_stacks

From: Xin Li
Date: Tue Oct 01 2024 - 13:52:35 EST


On 10/1/2024 9:12 AM, Dave Hansen wrote:
On 9/30/24 22:00, Xin Li (Intel) wrote:
The per CPU variable cea_exception_stacks contains per CPU stacks for
NMI, #DB and #DF, which is referenced in KVM to set host FRED RSP[123]
each time a vCPU is loaded onto a CPU, thus it needs to be exported.

Nit: It's not obvious how 'cea_exception_stacks' get used in this
series. It's never referenced explicitly.

I did figure it out by looking for 'RSP[123]' references, but a much
better changelog would be something like:

The per CPU array 'cea_exception_stacks' points to per CPU
stacks for NMI, #DB and #DF. It is normally referenced via the
#define: __this_cpu_ist_top_va().

FRED introduced new fields in the host-state area of the VMCS
for stack levels 1->3 (HOST_IA32_FRED_RSP[123]). KVM must
populate these each time a vCPU is loaded onto a CPU.


Yeah, this is way clearer.

See how that explicitly gives the reader greppable strings for
"__this_cpu_ist_top_va" and "HOST_IA32_FRED_RSP"? That makes it much
easier to figure out what is going on.

Nice for a maintainer in 20 years :)


I was also momentarily confused about why these loads need to be done on
_every_ vCPU load. I think it's because the host state can change as
the vCPU moves around to different physical CPUs and
__this_cpu_ist_top_va() can and will change. But it's a detail that I
think deserves to be explained in the changelog. There is also this
note in vmx_vcpu_load_vmcs():

Makes sense to me.


/*
* Linux uses per-cpu TSS and GDT, so set these when switching
* processors. See 22.2.4.
*/

which makes me think that it might not be bad to pull *all* of the
per-cpu VMCS field population code out into a helper since the reasoning
of why these need to be repopulated is identical.

Also, what's the purpose of clearing GUEST_IA32_FRED_RSP[123] at
init_vmcs() time? I would have thought that those values wouldn't
matter until the VMCS gets loaded at vmx_vcpu_load_vmcs() when they are
overwritten anyway. Or, I could be just totally misunderstanding how
KVM consumes the VMCS. :)

I don't see any misunderstanding. However we just do what the SDM
claims, even it seems that it's not a must *logically*.

FRED spec says:
The RESET state of each of the new MSRs is zero. INIT does not change
the value of the new MSRs

Thanks!
Xin