[PATCH tip/core/rcu 16/17] torture: Break affinity of kthreads last running on outgoing CPU

From: paulmck
Date: Wed Jan 06 2021 - 12:18:52 EST


From: "Paul E. McKenney" <paulmck@xxxxxxxxxx>

The advent of commit 06249738a41a ("workqueue: Manually break affinity
on hotplug") means that the scheduler no longer silently breaks affinity
for kthreads pinned to the outgoing CPU. This can happen for many of
rcutorture's kthreads due to shuffling, which periodically affinities
these ktheads away from a randomly chosen CPU. This usually works fine
because these kthreads are allowed to run on any other CPU and because
shuffling is a no-op any time there is but one online CPU.

However, consider the following sequence of events:

1. CPUs 0 and 1 are initially online.

2. The torture_shuffle_tasks() function affinities all the tasks
away from CPU 0.

3. CPU 1 goes offline.

4. All the tasks are now affinitied to an offline CPU, triggering
the warning added by the commit noted above.

This can trigger the following in sched_cpu_dying() in kernel/sched/core.c:

BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq))

This commit therefore adds a new torture_shuffle_tasks_offline() function
that is invoked from torture_offline() prior to offlining a CPU. This new
function scans the list of shuffled kthreads and for any thread that
last ran (or is set to run) on the outgoing CPU, sets its affinity to
all online CPUs. Thus there will never be a kthread that is affinitied
only to the outgoing CPU.

Of course, if the sysadm manually applies affinity to any of these
kthreads, all bets are off. However, such a sysadm must be fast because
the torture_shuffle_tasks_offline() function is invoked immediately before
offlining the outgoing CPU. Therefore, let it be known that with great
speed and great power comes great responsibility.

Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
---
kernel/torture.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/kernel/torture.c b/kernel/torture.c
index 01e336f..40c5c68 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -155,6 +155,8 @@ EXPORT_SYMBOL_GPL(torture_hrtimeout_s);

#ifdef CONFIG_HOTPLUG_CPU

+static void torture_shuffle_tasks_offline(int cpu);
+
/*
* Variables for online-offline handling. Only present if CPU hotplug
* is enabled, otherwise does nothing.
@@ -212,6 +214,7 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes,
torture_type, cpu);
starttime = jiffies;
(*n_offl_attempts)++;
+ torture_shuffle_tasks_offline(cpu);
ret = remove_cpu(cpu);
if (ret) {
s = "";
@@ -512,6 +515,20 @@ static void torture_shuffle_task_unregister_all(void)
mutex_unlock(&shuffle_task_mutex);
}

+#ifdef CONFIG_HOTPLUG_CPU
+// Unbind all tasks from a CPU that is to be taken offline.
+static void torture_shuffle_tasks_offline(int cpu)
+{
+ struct shuffle_task *stp;
+
+ mutex_lock(&shuffle_task_mutex);
+ list_for_each_entry(stp, &shuffle_task_list, st_l)
+ if (task_cpu(stp->st_t) == cpu)
+ set_cpus_allowed_ptr(stp->st_t, cpu_online_mask);
+ mutex_unlock(&shuffle_task_mutex);
+}
+#endif // #ifdef CONFIG_HOTPLUG_CPU
+
/* Shuffle tasks such that we allow shuffle_idle_cpu to become idle.
* A special case is when shuffle_idle_cpu = -1, in which case we allow
* the tasks to run on all CPUs.
--
2.9.5