RE: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation

From: Luck, Tony
Date: Wed Oct 11 2023 - 11:16:55 EST


> kernel test robot noticed a -8.8% regression of stress-ng.clock.ops_per_sec on:
>
>
> commit: 26bff7b04b829cccc6a97726d6398391a62e34ef ("[PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation")
> url: https://github.com/intel-lab-lkp/linux/commits/Tony-Luck/x86-mce-Remove-old-CMCI-storm-mitigation-code/20231005-024047
> patch link: https://lore.kernel.org/all/20231004183623.17067-3-tony.luck@xxxxxxxxx/
> patch subject: [PATCH v9 2/3] x86/mce: Add per-bank CMCI storm mitigation
>
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> nr_threads: 10%
> disk: 1HDD
> testtime: 60s
> fs: ext4
> class: os
> test: clock
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202310111637.dee70328-oliver.sang@xxxxxxxxx

Is the test injecting massive numbers of corrected memory errors? The code in this patch
is only executed when handling CMCI interrupts, or polling machine check banks (at most
once per second).

I'm guessing this report is just because alignment of some hot path code changed.

-Tony