On Fri, 2004-10-08 at 17:18, Nick Piggin wrote:
Matthew Dobson wrote:
I think this example is easily achievable with the sched_domains
modifications I am proposing. You can still create your 128 CPU
exclusive domain, called big_domain (due to my lack of naming
creativity), and further divide big_domain into smaller, non-exclusive
sched_domains. We do this all the time, albeit statically at boot time,
with the current sched_domains code. When we create a 4-node domain on
IA64, and underneath it we create 4 1-node domains. We've now
partitioned the system into 4 sched_domains, each containing 4 cpus. Balancing between these 4 node-level sched_domains is allowed, but can
be disallowed by not setting the SD_LOAD_BALANCE flag. Your example
does show that it can be more than just a convenient way to group tasks,
but your example can be done with what I'm proposing.
You wouldn't be able to do this just with sched domains, because
it doesn't know anything about individual tasks. As soon as you
have some overlap, all your tasks can escape out of your domain.
I don't think there is a really nice way to do overlapping sets.
Those that want them need to just use cpu affinity for now.
Well, the tasks can escape out of the domain iff you have the SD_LOAD_BALANCE flag set on that domain. If SD_LOAD_BALANCE isn't set,
then when the scheduler tick goes off, and the code looks at the domain,
it will see the lack of the flag and will not attempt to balance the
domain, correct? This is what we currently do with the 'isolated'
domains, right?
You're right that you can get some of the more obscure semantics of the
various flavors of cpusets by leveraging sched_domains AND
cpus_allowed. I don't have any desire to remove that ability, just keep
it as the exception.