On 4/1/25 3:59 PM, Tejun Heo wrote:
Hello, Waiman.
On Mon, Mar 31, 2025 at 11:12:06PM -0400, Waiman Long wrote:
The problem is the RCU delay between the time a cgroup is killed and is in aIf we don't have to do it too often, synchronize_rcu_expedited() may be
dying state and when the partition is deactivated when cpuset_css_offline()
is called. That delay can be rather lengthy depending on the current
workload.
workable too. What do you think?
I don't think we ever call synchronize_rcu() in the cgroup code except for rstat flush. In fact, we didn't use to have an easy way to know if there were dying cpusets hanging around. Now we can probably use the root cgroup's nr_dying_subsys[cpuset_cgrp_id] to know if we need to use synchronize_rcu*() call to wait for it. However, I still need to check if there is any racing window that will cause us to miss it.
Because of the above, I still prefer either using the original patch or scanning for dying cpuset partitions in case a conflict is detected. Please let me know what you think about it.
Another alternative that I can think of is to scan the remote partition listIf synchronize_rcu_expedited() won't do, let's go with the original patch.
for remote partition and sibling cpusets for local partition whenever some
kind of conflicts are detected when enabling a partition. When a dying
cpuset partition is detected, deactivate it immediately to resolve the
conflict. Otherwise, the dying partition will still be deactivated at
cpuset_css_offline() time.
That will be a bit more complex and I think can still get the problem solved
without adding a new method. What do you think? If you are OK with that, I
will send out a new patch later this week.
The operation does make general sense in that it's for a distinctive step in
the destruction process although I'm a bit curious why it's called before
DYING is set.