Re: [RFC] cpuset: remove sched domain hooks from cpusets

From: Nick Piggin
Date: Thu Oct 19 2006 - 15:22:32 EST


Paul Jackson wrote:
Nick wrote:

You shouldn't need to, assuming cpusets doesn't mess it up.


I'm guessing we're agreeing that the routines update_cpu_domains()
and related code in kernel/cpuset.c are messing things up.

At the moment they are, yes.

I view that code as a failed intrustion of some sched domain code into
cpusets, and apparently you view that code as a failed attempt to
manage sched domains coming from cpusets.

Oh well ... finger pointing is such fun ;).

:)

I don't know about finger pointing, but the sched-domains partitioning
works. It does what you ask of it, which is to partition the
multiprocessor balancing.

+ non_partitioned = top_cpuset.cpus_allowed;
+ update_cpu_domains_children(&top_cpuset, &non_partitioned);
+ partition_sched_domains(&non_partitioned);


So ... instead of throwing the baby out, you want to replace it
with a puppy. If one attempt to overload cpu_exclusive didn't
work, try another.

It isn't overloading anything. Your cpusets code has assigned a
particular semantic to cpu_exclusive. It so happens that we can
take advantage of this knowledge in order to do a more efficient
implementation.

It doesn't suddenly become a flag to manage sched-domains; its
semantics are completely unchanged (modulo bugs). The cpuset
interface semantics have no connection to sched-domains.

Put it this way: you don't think your code is currently
overloading the cpuset cpus_allowed setting in order to set the
task's cpus_allowed field, do you? You shouldn't need a flag to
tell it to set that, it is all just the mechanism behind the
policy.

I have two problems with this.

1) I haven't found any need for this, past the need to mark some
CPUs as isolated from the scheduler balancing code, which we
seem to be agreeing on, more or less, on another patch.

Please explain why we need this or any such mechanism for user
space to affect sched domain partitioning.

Until very recently, the multiprocessor balancing could easily be very
stupid when faced with cpus_allowed restrictions. This is somewhat
fixed, but it is still suboptimal compared to a sched-domains partition
when you are dealing with disjoint cpusets.

It is mostly SGI who seem to be running into these balancing issues, so
I would have thought this would be helpful for your customers primarily.

I don't know of anyone else using cpusets, but I'd be interested to know.

2) I've had better luck with the cpuset API by adding new flags
when I needed some additional semantics, rather than overloading
existing flags. So once we figure out what's needed and why,
then odds are I will suggest a new flag, specific to that purpose.

There is no new semantic beyond what is already specified by
cpu_exclusive.


This new flag might well logically depend on the cpu_exclusive
setting, if that's useful. But it would probably be a separate
flag or setting.

I dislike providing explicit mechanisms via implicit side affects.

This is more like providing a specific implementation for a given
semantic.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/