Re: [ckrm-tech] Re: [Lse-tech] [PATCH] cpusets - big numa cpu andmemory placement

From: Simon Derr
Date: Tue Oct 12 2004 - 03:58:22 EST


> One of the cool thing about using sched_domains as your partitioning
> element is that in reality, tasks run on *CPUs*, not *domains*. So if
> you have threads 'a1' & 'a2' running on CPUs 0 & 1 (small job 'a') and
> threads 'b1' & 'b2' running on CPUs 2 & 3 (small job 'b'), you can
> suspend threads a1, a2, b1 & b2 and remove the domains they were running
> in to allow job A (big job with threads A1, A2, A3, & A4) to run on the
> larger 4 CPU domain. When you then suspend A1-A4 again to allow the
> smaller jobs to proceed, you can pretty trivially create the 2 CPU
> domains underneath the 4 CPU domain and resume the jobs. Those jobs (a
> & b) have been suspended on the CPUs they were originally running on,
> and thus will resume on the same CPUs without any extra effort. They
> will simply run on those CPUs, and at load balance time, the domains
> attached to those CPUs will be consulted to determine where the tasks
> can be relocated to if there is a heavy load. The domains will tell the
> scheduler that the tasks cannot be relocated outside the 2 CPUs in each
> respective domain. Viola! (sorta ;)
Voilà ;-)

I agree that this looks really smooth from a scheduler point of view.

>From a user point of view, remains the issue of suspending the tasks:
-find which tasks to suspend : how do you know that job 'a' consists
exactly of 'a1' and 'a2'
-suspend them (btw, how do you achieve this ? kill -STOP ?)


I've been away from my mail and still trying to catch up, nevermind if the
above does not makes sense to you.

Simon.