[tip: sched/core] x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC

From: tip-bot2 for Giovanni Gherdovich
Date: Thu Dec 03 2020 - 04:15:31 EST


The following commit has been merged into the sched/core branch of tip:

Commit-ID: 46609527577d1def0af29ca5b56cffeeea771ada
Gitweb: https://git.kernel.org/tip/46609527577d1def0af29ca5b56cffeeea771ada
Author: Giovanni Gherdovich <ggherdovich@xxxxxxx>
AuthorDate: Thu, 12 Nov 2020 19:26:13 +01:00
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Thu, 03 Dec 2020 10:00:35 +01:00

x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC

Frequency invariant accounting calculations need the ratio
freq_curr/freq_max, but freq_max is unknown as it depends on dynamic power
allocation between cores: AMD EPYC CPUs implement "Core Performance Boost".
Three candidates are considered to estimate this value:

- maximum non-boost frequency
- maximum boost frequency
- the mid point between the above two

Experimental data on an AMD EPYC Zen2 machine slightly favors the third
option, which is applied with this patch.

The analysis uses the ondemand cpufreq governor as baseline, and compares
it with schedutil in a number of configurations. Using the freq_max value
described above offers a moderate advantage in performance and efficiency:

sugov-max (freq_max=max_boost) performs the worst on tbench: less
throughput and reduced efficiency than the other invariant-schedutil
options (see "Data Overview" below). Consider that tbench is generally a
problematic case as no schedutil version currently is better than ondemand.

sugov-P0 (freq_max=max_P) is the worst on dbench, while the other sugov's
can surpass ondemand with less filesystem latency and slightly increased
efficiency.

1. DATA OVERVIEW
2. DETAILED PERFORMANCE TABLES
3. POWER CONSUMPTION TABLE

1. DATA OVERVIEW
================

sugov-noinv : non-invariant schedutil governor
sugov-max : invariant schedutil, freq_max=max_boost
sugov-mid : invariant schedutil, freq_max=midpoint
sugov-P0 : invariant schedutil, freq_max=max_P
perfgov : performance governor

driver : acpi_cpufreq
machine : AMD EPYC 7742 (Zen2, aka "Rome"), dual socket,
128 cores / 256 threads, SATA SSD storage, 250G of memory,
XFS filesystem

Benchmarks are described in the next section.
Tilde (~) means the value is the same as baseline.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20201112182614.10700-3-ggherdovich@xxxxxxx
---
arch/x86/kernel/smpboot.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index a4ab5cf..c5dd5f6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -2054,6 +2054,8 @@ static bool amd_set_max_freq_ratio(void)
}

perf_ratio = div_u64(highest_perf * SCHED_CAPACITY_SCALE, nominal_perf);
+ /* midpoint between max_boost and max_P */
+ perf_ratio = (perf_ratio + SCHED_CAPACITY_SCALE) >> 1;
if (!perf_ratio) {
pr_debug("Non-zero highest/nominal perf values led to a 0 ratio\n");
return false;