Re: boot cgroup questions

From: Max Krasnyanskiy
Date: Wed Mar 12 2008 - 18:29:50 EST


Paul Jackson wrote:

This is what I meant by "deeper hierarchies" in the earlier emails.

These deeper hierarchies create an incompatibility in some common uses
of cpusets.

When my example had cpusets A, B and C, that was as stated, not as
might be modified to X, X/A, X/B and C.

If the user has or would have setup cpusets A, B and C because that's
what they needed to manage the CPU and Memory Node placement of their
tasks, then that's what they might have setup, and there is a good
chance that they would find the imposition of the extra 'X' cpuset to
be a problem, to require more code and to be a cause of bugs.

Isn't that just an issue of planing ? Those cpusets are not cast in stones are they. I mean yes users have setup A,B,C they way they did because that's what they needed. Now their plans/requirements have changed. They now want to also manage irqs via cpusets and in order to do that they need to replan/redo the partitioning.

In order to manager irqs the code has to change anyway because currently there is not way to do that via cpuset. The users would have two options:
1. keep all irqs in the top set and manage them individually via /proc
2. layout cpusets differently

btw I still do not see the "incompatibility" argument. Probably because I have no idea how the software you're talking about is designed. Are you saying that the software relies on a flat cpuset partitioning ? ie That it will brake if users add extra cpuset levels.

Adding irqs to the cpuset hierarchy isn't free; it can further overload
the hierarchy, with "deeper hierarchies" as you state.

If instead of deeper hierarchies, we allow the same irq to be listed in
more than one cpuset (unlike tasks, which only get one cpuset) then we
need some way, independent of the cpuset hierarchy, to determine how to
resolve conflicts. We can't just add all the cpus together, allowing an
irq to be directed to any CPU which is listed in any cpuset that
accepts that irq, because a major use for this is to remove irqs from
certain realtime CPUs.
This sounds like an overkill and as you pointed out is not even clear how it'd work.

Looks like we have a trade-off here:
1. use simple "irq == pseudo-task" concept and potentially brake some existing software. We do have working solution.
2. come up with something that requires more complex irq management rules at the expense of complexity. We do not have working solution.

My vote goes for #1 :).

So ... if the natural hierarchy needed to map irqs to CPUs is not a
subset of the natural hierarchy needed to map tasks to sets of CPUs and
Nodes, then we either deepen the hierarchy (cross product of the the
two maps, essentially) or we allow the same irq to be listed in
multiple cpusets and provide some alternative mechanism, outside the
hierarchy, to resolve the resulting conflicts in the irq to CPU map.
I think by natural you mean "compatible with existing sw". What is unnatural in extra levels of cpusets ? If I read cgroup/cpuset documentation it seems to imply that nested cgroups/cpuset are allowed.

Max


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/