Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx

From: Ankur Arora
Date: Wed Oct 14 2020 - 15:26:33 EST

Next message: Vivek Unune: "[PATCH] ARM: dts: BCM5301X: Linksys EA9500 add port 5 and port 7"
Previous message: Florian Fainelli: "Re: [PATCH 02/11] firmware: arm_scmi: hide protocols' private data"
In reply to: Ingo Molnar: "Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx"
Next in thread: Ankur Arora: "[PATCH 1/8] x86/cpuid: add X86_FEATURE_NT_GOOD"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2020-10-14 8:31 a.m., Ingo Molnar wrote:

* Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote:

System: Oracle X6-2
CPU: 2 nodes * 10 cores/node * 2 threads/core
Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1)
Memory: 256 GB evenly split between nodes
Microcode: 0xb00002e
scaling_governor: performance
L3 size: 25MB
intel_pstate/no_turbo: 1

Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb
(X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD):

x86-64-stosb (5 runs) x86-64-movnt (5 runs) speedup
----------------------- ----------------------- -------
size BW ( pstdev) BW ( pstdev)

16MB 17.35 GB/s ( +- 9.27%) 11.83 GB/s ( +- 0.19%) -31.81%
128MB 5.31 GB/s ( +- 0.13%) 11.72 GB/s ( +- 0.44%) +121.84%
1024MB 5.42 GB/s ( +- 0.13%) 11.78 GB/s ( +- 0.03%) +117.34%
4096MB 5.41 GB/s ( +- 0.41%) 11.76 GB/s ( +- 0.07%) +117.37%

+ if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X)
+ set_cpu_cap(c, X86_FEATURE_NT_GOOD);

So while I agree with how you've done careful measurements to isolate bad
microarchitectures where non-temporal stores are slow, I do think this
approach of opt-in doesn't scale and is hard to maintain.

Instead I'd suggest enabling this by default everywhere, and creating a
X86_FEATURE_NT_BAD quirk table for the bad microarchitectures.

Okay, some kind of quirk table is a great idea. Also means that there's a
single place for keeping this rather than it being scattered all over in
the code.

That also simplifies my handling of features like X86_FEATURE_CLZERO.
I was concerned that if you squint a bit, it seems to be an alias to
X86_FEATURE_NT_GOOD and that seemed ugly.

This means that with new microarchitectures we'd get automatic enablement,
and hopefully chip testing would identify cases where performance isn't as
good.

Makes sense to me. A first class citizen, as it were...

Thanks for reviewing btw.

Ankur

I.e. the 'trust but verify' method.

Thanks,

Ingo

Next message: Vivek Unune: "[PATCH] ARM: dts: BCM5301X: Linksys EA9500 add port 5 and port 7"
Previous message: Florian Fainelli: "Re: [PATCH 02/11] firmware: arm_scmi: hide protocols' private data"
In reply to: Ingo Molnar: "Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx"
Next in thread: Ankur Arora: "[PATCH 1/8] x86/cpuid: add X86_FEATURE_NT_GOOD"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]