[RFC PATCH 2/2] sched: Build L2 cache scheduler domain for x86

From: Tim Chen
Date: Fri Aug 21 2020 - 20:01:22 EST


To prevent oversubscription of the L2 cache, load should be balanced
between L2 cache domains.

Add new scheduler domain at the L2 cache level for x86.

On benchmark such as SPECrate mcf test, this change provides a
boost to performance on medium load system on Jacobsville.

Note that this added domain level will increase migrations
between CPUs. So this is not necessarily a universal win if
the migration cost of balancing L2 load outweigh the benefit
from reduced L2 contention. This change tends to benefit CPU bound
threads that get moved around much less.

Note also that if the L2 sched domain is the same as the SMT sched domain
(i.e. 1 core), it will be degenerate and not be added unnecessarily when
sched domains are being built at the cpu_attach_domain phase. This new
sched domain will only be added when L2 is shared among CPU cores.

The L2 cache information is detected after the initial build of scheduler
domains during boot. So it is necessary to rebuild the sched domains
after all the CPUs have been fully brought up.

Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
---
arch/x86/Kconfig | 15 +++++++++++++++
arch/x86/kernel/cpu/cacheinfo.c | 3 +++
arch/x86/kernel/smpboot.c | 14 ++++++++++++++
init/main.c | 3 +++
4 files changed, 35 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac64bb20..97775ec16e72 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1014,6 +1014,21 @@ config SCHED_MC
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.

+config SCHED_MC_L2
+ def_bool n
+ prompt "Multi-core scheduler L2 scheduler domain support"
+ depends on SCHED_MC && SMP
+ help
+ Adding level 2 cache scheduler domain will have CPU scheduler
+ balance load between L2 caches. This reduces oversubscription
+ of L2 cahce on system that has multiple CPU cores sharing
+ a L2 cache. This option benefits system with mostly CPU
+ bound tasks. For tasks that wake up and sleep frequently,
+ this option does increase the frequency of task migraions and
+ increased load balancing latency.
+
+ If unsure say N here.
+
config SCHED_MC_PRIO
bool "CPU core priorities scheduler support"
depends on SCHED_MC && CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c7503be92f35..fb3facab58d0 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1030,6 +1030,9 @@ static int __populate_cache_leaves(unsigned int cpu)
__cache_cpumap_setup(cpu, idx, &id4_regs);
}
this_cpu_ci->cpu_map_populated = true;
+#ifdef CONFIG_SCHED_MC_L2
+ x86_topology_update = true;
+#endif

return 0;
}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8ba0b505f020..80cdccd1bcab 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -528,6 +528,14 @@ static int x86_core_flags(void)
{
return cpu_core_flags() | x86_sched_itmt_flags();
}
+
+#ifdef CONFIG_SCHED_MC_L2
+static int x86_l2mc_flags(void)
+{
+ return cpu_core_flags() | x86_sched_itmt_flags();
+}
+#endif
+
#endif
#ifdef CONFIG_SCHED_SMT
static int x86_smt_flags(void)
@@ -542,6 +550,9 @@ static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
#endif
#ifdef CONFIG_SCHED_MC
+#ifdef CONFIG_SCHED_MC_L2
+ { cpu_l2group_mask, x86_l2mc_flags, SD_INIT_NAME(L2MC) },
+#endif
{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
#endif
{ NULL, },
@@ -552,6 +563,9 @@ static struct sched_domain_topology_level x86_topology[] = {
{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
#endif
#ifdef CONFIG_SCHED_MC
+#ifdef CONFIG_SCHED_MC_L2
+ { cpu_l2group_mask, x86_l2mc_flags, SD_INIT_NAME(L2MC) },
+#endif
{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
#endif
{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
diff --git a/init/main.c b/init/main.c
index ae78fb68d231..f4f814f8a127 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1405,6 +1405,9 @@ static int __ref kernel_init(void *unused)
ftrace_free_init_mem();
free_initmem();
mark_readonly();
+#ifdef CONFIG_SCHED_MC_L2
+ rebuild_sched_domains();
+#endif

/*
* Kernel mappings are now finalized - update the userspace page-table
--
2.20.1