Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

From: Paul Jackson
Date: Tue Apr 19 2005 - 11:22:21 EST


Simon wrote:
> I guess we hit a limit of the filesystem-interface approach here.
> Are the possible failure reasons really that complex ?

Given the amount of head scratching my proposal has provoked
so far, they might be that complex, yes. Failure reasons
include:
* The cpuset Foo whose domain_cpu_rebuild file we wrote does
not align with the current partition of CPUs on the system
(align: every subset of the partition is either within or
outside the CPUs of Foo)
* The cpusets Foo and its descendents which are marked with
a true domain_cpu_pending do not form a partition of the
CPUs in Foo. This could be either because two of these
cpusets have overlapping CPUs, or because the union of all
the CPUs in these cpusets doesn't cover.
* The usual other reasons such as lacking write permission.

> If this is only to get a hint, OK.

Yes - it would be a hint. The official explanation would be the
errno setting on the failed write. The hint, written to the
domain_cpu_error file, could actually state which two cpusets
had overlapping CPUs, or which CPUs in Foo were not covered by
the union of the CPUs in the marked descendent cpusets.

Yes - it pushing the limits of available mechanisms. Though I don't
offhand see where the filesystem-interface approach is to blame here.
Can you describe any other approach that would provide such a similarly
useful error explanation in a less unusual fashion?

> Is such an error reporting scheme already in use in the kernel ?

I don't think so.

> On the other hand, there's also no guarantee that what we are triggering
> by writing in domain_cpu_rebuild is what we have set up by writing in
> domain_cpu_pending. User applications will need a bit of self-discipline.

True.

To preserve the invariant that the CPUs in the selected cpusets form a
partition (disjoint cover) of the systems CPUs, we either need to
provide an atomic operation that allows passing in a selection of
cpusets, or we need to provide a sequence of operations that essentially
drive a little finite state machine, building up a description of the
new state while the old state remains in place, until the final trigger
is fired.

This suggests what the primary alternative to my proposed API would be,
and that would be an interface that let one pass in a list of cpusets,
requesting that the partition below the specified cpuset subtree Foo be
completely and atomically rebuilt, to be that defined by the list of
cpusets, with the set of CPUS in each of these cpusets defining one
subset of the partition.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxxxxxxx> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/