[RFC PATCH v4 3/4] cgroup/cpuset: Restart CPUs whose isolated_cpus bits have changed
From: Costa Shulyupin
Date: Sun Dec 01 2024 - 07:43:38 EST
The goal is to dynamically isolate CPUs to prevent interference
from housekeeping subsystems.
The housekeeping CPU masks, set up by the "isolcpus" and "nohz_full"
boot command line options, are used at boot time to exclude selected
CPUs from running some kernel housekeeping subsystems to minimize
interference with latency sensitive userspace applications such as DPDK.
This options can only be changed with a reboot. This is a problem for
containerized workloads running on OpenShift/Kubernetes where a mix of
low latency and "normal" workloads can be created/destroyed dynamically
and the number of CPUs allocated to each workload is often not known at
boot time.
CPU hotplug can be used to isolate CPUs by restarting related CPUs only,
without complete reboot.
Experimental solution.
Automatically restart changed CPUs when the `isolated_cpus` is modified
through the cgroup/cpuset interface.
No additional manipulation of the CPU online status from userspace is
required, and it remains compatible with existing software.
cpu_device_down()/cpu_device_up() can't be called within subroutines of
cpuset_write_resmask() because it locks `cpu_hotplug_lock` with
cpus_read_lock() but _cpu_down()/_cpu_up() lock `cpu_hotplug_lock` with
cpus_write_lock().
Intuitively the change of `isolated_cpus` should be performed between
cpu_device_down() and cpu_device_up(). Since cpu_device_down(), at
least for managed interrupts, doesn't depends on `isolated_cpus` and
`housekeeping` it is more simple to call cpu_device_down() after change
of `isolated_cpus` and `housekeeping` and cpus_read_unlock().
Signed-off-by: Costa Shulyupin <costa.shul@xxxxxxxxxx>
---
kernel/cgroup/cpuset.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 570941d782ef..d5d2b4036314 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3131,6 +3131,27 @@ static void cpuset_attach(struct cgroup_taskset *tset)
mutex_unlock(&cpuset_mutex);
}
+/*
+ * Restart CPUs whose isolated_cpus bits have changed.
+ * Enforce subsystems to adopt the new isolated_cpus and housekeeping masks
+ * using CPU hotplug.
+ */
+static void propogate_isolated_cpus_change(struct cpumask *isolated_cpus_prev)
+{
+ unsigned int cpu;
+
+ if (!isolated_cpus_prev)
+ return;
+
+ for_each_online_cpu(cpu) {
+ if (cpumask_test_cpu(cpu, isolated_cpus_prev) !=
+ cpumask_test_cpu(cpu, isolated_cpus)) {
+ remove_cpu(cpu);
+ add_cpu(cpu);
+ }
+ }
+}
+
/*
* Common handling for a write to a "cpus" or "mems" file.
*/
@@ -3138,6 +3159,7 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
struct cpuset *cs = css_cs(of_css(of));
+ cpumask_var_t isolated_cpus_prev;
struct cpuset *trialcs;
int retval = -ENODEV;
@@ -3167,6 +3189,12 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
cpus_read_lock();
mutex_lock(&cpuset_mutex);
+ if (!alloc_cpumask_var(&isolated_cpus_prev, GFP_KERNEL)) {
+ retval = -ENOMEM;
+ goto out_unlock;
+ }
+
+ cpumask_copy(isolated_cpus_prev, isolated_cpus);
if (!is_cpuset_online(cs))
goto out_unlock;
@@ -3200,6 +3228,13 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
kernfs_unbreak_active_protection(of->kn);
css_put(&cs->css);
flush_workqueue(cpuset_migrate_mm_wq);
+
+ /* If isolated_cpus modified, the change must be propagated
+ * to all subsystems.
+ */
+ propogate_isolated_cpus_change(isolated_cpus_prev);
+ free_cpumask_var(isolated_cpus_prev);
+
return retval ?: nbytes;
}
--
2.47.0