Re: [kvm-unit-tests Patch 0/5] Fix PMU test failures on Sapphire Rapids

From: Mingwei Zhang
Date: Sun Oct 29 2023 - 23:57:27 EST


On Thu, Oct 26, 2023, Mi, Dapeng wrote:
> On 10/26/2023 7:47 AM, Mingwei Zhang wrote:
> > On Tue, Oct 24, 2023, Dapeng Mi wrote:
> > > When running pmu test on Intel Sapphire Rapids, we found several
> > > failures are encountered, such as "llc misses" failure, "all counters"
> > > failure and "fixed counter 3" failure.
> > hmm, I have tested your series on a SPR machine. It looks like, all "llc
> > misses" already pass on my side. "all counters" always fail with/without
> > your patches. "fixed counter 3" never exists... I have "fixed
> > cntr-{0,1,2}" and "fixed-{0,1,2}"
>
> 1. "LLC misses" failure
>
> Yeah, the "LLC misses" failure is not always seen. I can see the "LLC 
> misses" 2 ~3 times out of 10 runs of PMU standalone test and you could see
> the failure with higher possibility if you run the full kvm-unit-tests. I
> think whether you can see the "LLC misses" failure it really depends on
> current cache status on your system, how much cache memory are consumed by
> other programs. If there are lots of free cache lines on system when running
> the pmu test, you may have higher possibility to see the LLC misses failures
> just like what I see below.
>
> PASS: Intel: llc references-7
> *FAIL*: Intel: llc misses-0
> PASS: Intel: llc misses-1
> PASS: Intel: llc misses-2
>
> 2. "all counters" failure
>
> Actually the "all counters" failure are not always seen, but it doesn't mean
> current code is correct. In current code, the length of "cnt[10]" array in
> check_counters_many() is defined as 10, but there are at least 11 counters
> supported (8 GP counters + 3 fixed counters) on SPR even though fixed
> counter 3 is not supported in current upstream code. Obviously there would
> be out of range memory access in check_counters_many().
>

ok, I will double check on these. Thanks.

> >
> > You may want to double check the requirements of your series. Not just
> > under your setting without explainning those setting in detail.
> >
> > Maybe what I am missing is your topdown series? So, before your topdown
> > series checked in. I don't see value in this series.
>
> 3. "fixed counter 3" failure
>
> Yeah, I just realized I used the kernel which includes the vtopdown
> supporting patches after Jim's reminding. As the reply for Jim's comments
> says, the patches for support slots event are still valuable for current
> emulation framework and I would split them from the original vtopdown
> patchset and resend them as an independent patchset. Anyway, even though
> there is not slots event support in Kernel, it only impacts the patch 4/5,
> other patches are still valuable.
>
>
> >
> > Thanks.
> > -Mingwei
> > > Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters
> > > including GP and fixed counters increase to 12 and also optimizes cache
> > > subsystem. All these changes make the original assumptions in pmu test
> > > unavailable any more on Sapphire Rapids. Patches 2-4 fixes these
> > > failures, patch 0 remove the duplicate code and patch 5 adds assert to
> > > ensure predefine fixed events are matched with HW fixed counters.
> > >
> > > Dapeng Mi (4):
> > > x86: pmu: Change the minimum value of llc_misses event to 0
> > > x86: pmu: Enlarge cnt array length to 64 in check_counters_many()
> > > x86: pmu: Support validation for Intel PMU fixed counter 3
> > > x86: pmu: Add asserts to warn inconsistent fixed events and counters
> > >
> > > Xiong Zhang (1):
> > > x86: pmu: Remove duplicate code in pmu_init()
> > >
> > > lib/x86/pmu.c | 5 -----
> > > x86/pmu.c | 17 ++++++++++++-----
> > > 2 files changed, 12 insertions(+), 10 deletions(-)
> > >
> > >
> > > base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
> > > --
> > > 2.34.1
> > >