Hello, Waiman.
On Sun, May 28, 2023 at 05:18:50PM -0400, Waiman Long wrote:
On 5/22/23 15:49, Tejun Heo wrote:And me too. Just moved.
Sorry for the late reply as I had been off for almost 2 weeks due to PTO.
Ah, I see, this is because cpu.reserve is only in the root cgroup, so youWhy is the syntax different from .cpus? Wouldn't it be better to keep themUnlike cpuset.cpus, cpuset.cpus.reserve is supposed to contains CPUs that
the same?
are used in multiple partitions. Also automatic reservation of adjacent
partitions can happen in parallel. That is why I think it will be safer if
can't say that the knob is owned by the parent cgroup and thus access is
controlled that way.
...
The fact that the control is spread across a root-only file and per-cgroupAre you referring to the fact that only remote isolated partitions areThere are two types of partitions - adjacent and remote. TheCan you elaborate this extra restriction a bit further?
parent of an adjacent partition must be a valid partition root.
Partition roots of adjacent partitions are all clustered around
the root cgroup. Creation of adjacent partition is done by
writing the desired partition type into "cpuset.cpus.partition".
A remote partition does not require a partition root parent.
So a remote partition can be formed far from the root cgroup.
However, its creation is a 2-step process. The CPUs needed
by a remote partition ("cpuset.cpus" of the partition root)
has to be written into "cpuset.cpus.reserve" of the root
cgroup first. After that, "isolated" can be written into
"cpuset.cpus.partition" of the partition root to form a remote
isolated partition which is the only supported remote partition
type for now.
All remote partitions are terminal as adjacent partition cannot
be created underneath it.
supported? I do not preclude the support of load balancing remote
partitions. I keep it to isolated partitions for now for ease of
implementation and I am not currently aware of a use case where such a
remote partition type is needed.
If you are talking about remote partition being terminal. It is mainly
because it can be more tricky to support hierarchical adjacent partitions
underneath it especially if it is not isolated. We can certainly support it
if a use case arises. I just don't want to implement code that nobody is
really going to use.
BTW, with the current way the remote partition is created, it is not
possible to have another remote partition underneath it.
file seems hacky to me. e.g. How would it interact with namespacing? Are
there reasons why this can't be properly hierarchical other than the amount
of work needed? For example:
cpuset.cpus.exclusive is a per-cgroup file and represents the mask of CPUs
that the cgroup holds exclusively. The mask is always a subset of
cpuset.cpus. The parent loses access to a CPU when the CPU is given to a
child by setting the CPU in the child's cpus.exclusive and the CPU can't
be given to more than one child. IOW, exclusive CPUs are available only to
the leaf cgroups that have them set in their .exclusive file.
When a cgroup is turned into a partition, its cpuset.cpus and
cpuset.cpus.exclusive should be the same. For backward compatibility, if
the cgroup's parent is already a partition, cpuset will automatically
attempt to add all cpus in cpuset.cpus into cpuset.cpus.exclusive.
I could well be missing something important but I'd really like to see
something like the above where the reservation feature blends in with the
rest of cpuset.