Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid partition root

From: Waiman Long
Date: Fri Jul 16 2021 - 17:12:24 EST


On 7/16/21 4:46 PM, Tejun Heo wrote:
Hello, Waiman.

On Fri, Jul 16, 2021 at 04:08:15PM -0400, Waiman Long wrote:
I agree with you on principle. However, the reason why there are
more restrictions on enabling partition is because I want to avoid
forcing the users to always read back cpuset.partition.type to see
if the operation succeeds instead of just getting an error from the
operation. The former approach is more error prone. If you don't
want changes in existing behavior, I can relax the checking and
allow them to become an invalid partition if an illegal operation
happens.

Also there is now another cpuset patch to extend cpu isolation to
cgroup v1 [1]. I think it is better suit to the cgroup v2 partition
scheme, but cgroup v1 is still quite heavily out there.

Please let me know what you want me to do and I will send out a v3
version.
Note that the current cpuset partition implementation have implemented
some restrictions on when a partition can be enabled. However, I missed
some corner cases in the original implementation that allow certain
cpuset operations to make a partition invalid. I tried to plug those
holes in this patchset. However, if maintaining backward compatibility
is more important, I can leave those holes and update the documentation
to make sure that people check cpuset.partition.type to confirm if their
operation succeeds.
I just realize that partition root set the CPU_EXCLUSIVE bit. So changes to
cpuset.cpus that break exclusivity rule is not allowed anyway. This patchset
is just adding additional checks so that cpuset.cpus changes that break the
partition root rules will not be allowed. I can remove those additional
checks for this patchset and allow cpuset.cpus changes that break the
partition root rules to make it invalid instead. However, I still want
invalid changes to cpuset.partition.type to be disallowed.
So, I get the instinct to disallow these operations and it'd make sense if
the conditions aren't reachable otherwise. However, I'm afraid what users
eventually get is false sense of security rather than any actual guarantee.

Inconsistencies like this cause actual usability hazards - e.g. imagine a
system config script whic sets up exclusive cpuset and let's say that the
use case is fine with degraded operation when the target cores are offline
(e.g. energy save mode w/ only low power cores online). Let's say this
script runs in late stages during boot and has been reliable. However, at
some point, there are changes in boot sequence and now there's low but
non-trivial chance that the system would already be in low power state when
the script runs. Now the script will fail sporadically and the whole thing
would be pretty awkward to debug.

I'd much prefer to have an explicit interface to confirm the eventual state
and a way to monitor state transitions (without polling). An invalid state
is an inherent part of cpuset configuration. I'd much rather have that
really explicit in the interface even if that means a bit of extra work at
configuration time.

Are you suggesting that we add a cpuset.cpus.events file that allows processes to be notified if an event (e.g. hotplug) that changes a partition root to invalid partition happens or when explicit change to a partition root fails? Will that be enough to satisfy your requirement?

Cheers,
Longman