Re: [PATCH 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous

From: Waiman Long
Date: Tue Apr 02 2024 - 11:31:16 EST



On 4/2/24 10:13, Michal Koutný wrote:
Hello Waiman.

(I have no opinion on the overall locking reworks, only the bits about
v1 migrations caught my attention.)

On Mon, Apr 01, 2024 at 10:58:57AM -0400, Waiman Long <longman@xxxxxxxxxx> wrote:
...
@@ -4383,12 +4377,20 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
/*
* Move tasks to the nearest ancestor with execution resources,
* This is full cgroup operation which will also call back into
- * cpuset. Should be done outside any lock.
+ * cpuset. Execute it asynchronously using workqueue.
...to avoid deadlock on cpus_read_lock

Is this the reason?
Also, what happens with the tasks in the window till the migration
happens?
Is it handled gracefully that their cpu is gone?

Yes, there is a potential that a cpus_read_lock() may be called leading to deadlock. So unless we reverse the current cgroup_mutex --> cpu_hotplug_lock ordering, it is not safe to call cgroup_transfer_tasks() directly.




- if (is_empty) {
- mutex_unlock(&cpuset_mutex);
- remove_tasks_in_empty_cpuset(cs);
- mutex_lock(&cpuset_mutex);
+ if (is_empty && css_tryget_online(&cs->css)) {
+ struct cpuset_remove_tasks_struct *s;
+
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
Is there a benefit of having a work for each cpuset?
Instead of traversing whole top_cpuset once in the async work.

We could do that too. It's just that we have the repeat the iteration process once the workfn is invoked, but that has the advantage of not needing to do memory allocation. I am OK with either way. Let's see what other folks think about that.

Cheers,
Longman