Re: sched_domains SD_BALANCE_FORK and sched_balance_self

From: Martin J. Bligh
Date: Tue Aug 09 2005 - 17:20:35 EST




--On Tuesday, August 09, 2005 15:03:32 -0700 "Siddha, Suresh B" <suresh.b.siddha@xxxxxxxxx> wrote:

> On Fri, Aug 05, 2005 at 04:29:45PM -0700, Darren Hart wrote:
>> I have some concerns as to the intent vs. actual implementation of
>> SD_BALANCE_FORK and the sched_balance_fork() routine.
>
> Intent and implementation match. Problem is with the intent ;-)
>
> This has the intent info.
>
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=147cbb4bbe991452698f0772d8292f22825710ba
>
> To solve these issues, we need to make the sched domain and its parameters
> CMP aware. And dynamically we need to adjust these parameters based
> on the system properties.

Can you explain the purpose of doing balance on both fork and exec?
The reason we did it at exec time is that it's much cheaper to do
than at fork - you have very, very little state to deal with. The vast
majority of things that fork will exec immediately thereafter.

Balance on clone make some sort of sense, since you know they're not
going to exec afterwards. We've thrashed through this many times before
and decided that unless there was an explicit hint from userspace,
balance on fork was not a good thing to do in the general case. Not only
based on a large range of testing, but also previous experience from other
Unix's. What new data came forth to change this?

>> It seems to me that the best CPU for a forked process would be an idle
>> CPU on the same node as the parent in order to stay close to it's memory.
>> Failing this, we may need to move to other nodes if they are idle enough
>> to warrant the move across node boundaries. Thoughts?
>
> We can choose the leastly loaded CPU in the home node and we can let the
> load balance to move it to other nodes if there is an imbalance.

Is that what it's actually doing now? That's not what Nick told me at
Kernel Summit, but is the correct thing to do for clone, I think.

> For exec, we can have the SD_BALANCE_EXEC for all the sched domains, which
> is the case today.

Yup.

M.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/