Re: [ckrm-tech] Re: [Lse-tech] [PATCH] cpusets - big numa cpu andmemory placement

From: Matthew Dobson
Date: Tue Oct 12 2004 - 16:30:10 EST


On Tue, 2004-10-12 at 01:50, Simon Derr wrote:
> > One of the cool thing about using sched_domains as your partitioning
> > element is that in reality, tasks run on *CPUs*, not *domains*. So if
> > you have threads 'a1' & 'a2' running on CPUs 0 & 1 (small job 'a') and
> > threads 'b1' & 'b2' running on CPUs 2 & 3 (small job 'b'), you can
> > suspend threads a1, a2, b1 & b2 and remove the domains they were running
> > in to allow job A (big job with threads A1, A2, A3, & A4) to run on the
> > larger 4 CPU domain. When you then suspend A1-A4 again to allow the
> > smaller jobs to proceed, you can pretty trivially create the 2 CPU
> > domains underneath the 4 CPU domain and resume the jobs. Those jobs (a
> > & b) have been suspended on the CPUs they were originally running on,
> > and thus will resume on the same CPUs without any extra effort. They
> > will simply run on those CPUs, and at load balance time, the domains
> > attached to those CPUs will be consulted to determine where the tasks
> > can be relocated to if there is a heavy load. The domains will tell the
> > scheduler that the tasks cannot be relocated outside the 2 CPUs in each
> > respective domain. Viola! (sorta ;)
> Voilà ;-)

hehe... My French spelling obviously isn't quite up to par. ;)


> I agree that this looks really smooth from a scheduler point of view.
>
> From a user point of view, remains the issue of suspending the tasks:
> -find which tasks to suspend : how do you know that job 'a' consists
> exactly of 'a1' and 'a2'
> -suspend them (btw, how do you achieve this ? kill -STOP ?)
>
>
> I've been away from my mail and still trying to catch up, nevermind if the
> above does not makes sense to you.
>
> Simon.

Paul didn't go into specifics about how to suspend the job, so neither
did I. Sending SIGSTOP & SIGCONT should work, as you mention... Those
are implementation details which really aren't *that* important to the
discussion. We're still trying to figure out the overall framework and
API to work with, so which method of suspending a thread we'll
eventually use can be tackled down the road. :)

-Matt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/