Re: [PATCH v2] KVM: x86/pmu: Fix emulation on Intel counters' bit width

From: Like Xu
Date: Tue Mar 28 2023 - 05:16:49 EST


On 27/3/2023 10:30 pm, Paolo Bonzini wrote:
On Wed, Mar 22, 2023 at 10:31 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote:

From: Like Xu <likexu@xxxxxxxxxxx>

Per Intel SDM, the bit width of a PMU counter is specified via CPUID
only if the vCPU has FW_WRITE[bit 13] on IA32_PERF_CAPABILITIES.
When the FW_WRITE bit is not set, only EAX is valid and out-of-bounds
bits accesses do not generate #GP. Conversely when this bit is set, #GP
for out-of-bounds bits accesses will also appear on the fixed counters.
vPMU currently does not support emulation of bit widths lower than 32
bits or higher than its host capability.

Can you please point out the date and paragraph of the SDM?

Paolo


25462-078US, December 2022
20.2.6 Full-Width Writes to Performance Counter Registers

The general-purpose performance counter registers IA32_PMCx are writable via WRMSR instruction.
However, the value written into IA32_PMCx by WRMSR is the signed extended 64-bit value of the
EAX[31:0] input of WRMSR.

A processor that supports full-width writes to the general-purpose performance counters enumerated by
CPUID.0AH:EAX[15:8] will set IA32_PERF_CAPABILITIES[13] to enumerate its full-width-write
capability See Figure 20-65.

If IA32_PERF_CAPABILITIES.FW_WRITE[bit 13] =1, each IA32_PMCi is accompanied by a
corresponding alias address starting at 4C1H for IA32_A_PMC0.

The bit width of the performance monitoring counters is specified in CPUID.0AH:EAX[23:16].
If IA32_A_PMCi is present, the 64-bit input value (EDX:EAX) of WRMSR to IA32_A_PMCi will cause
IA32_PMCi to be updated by:

COUNTERWIDTH =
CPUID.0AH:EAX[23:16] bit width of the performance monitoring counter
IA32_PMCi[COUNTERWIDTH-1:32] := EDX[COUNTERWIDTH-33:0]);
IA32_PMCi[31:0] := EAX[31:0];
EDX[63:COUNTERWIDTH] are reserved

---

Some might argue that this is all talking about GP counters, not fixed counters.
In fact, the full-width write hw behaviour is presumed to do the same thing for all counters.

Commercial hardware will not use less than 32 bits or a bit width like 46 bits.
A KVM user space (such as selftests) may set a strange bit-width, for example using 33 bits,
and based on the current code, writing the reserved bits for #fixed counters doesn't cause #GP.

Also when the guest does not have the Full-Width feature, the fixed counters can be more than
32 bits wide via CPUID, while the #GP counter is only 32 bits wide, which is also monstrous.

The current KVM is also not capable of emulating counter overflow when KVM user space is set
to a bit width of less than 32 bits w/ FW_WRITE.

The above SDM-undefined behaviour led to this fix, which may lift some of the fog.