Re: [PATCH v3 01/12] sched: fix imbalance flag reset

From: Peter Zijlstra
Date: Wed Jul 09 2014 - 10:46:05 EST


On Wed, Jul 09, 2014 at 05:11:20PM +0530, Preeti U Murthy wrote:
> On 07/09/2014 04:13 PM, Peter Zijlstra wrote:
> > On Wed, Jul 09, 2014 at 09:24:54AM +0530, Preeti U Murthy wrote:
> >> In the example that I mention above, t1 and t2 are on the rq of cpu0;
> >> while t1 is running on cpu0, t2 is on the rq but does not have cpu1 in
> >> its cpus allowed mask. So during load balance, cpu1 tries to pull t2,
> >> cannot do so, and hence LBF_ALL_PINNED flag is set and it jumps to
> >> out_balanced. Note that there are only two sched groups at this level of
> >> sched domain.one with cpu0 and the other with cpu1. In this scenario we
> >> do not try to do active load balancing, atleast thats what the code does
> >> now if LBF_ALL_PINNED flag is set.
> >
> > I think Vince is right in saying that in this scenario ALL_PINNED won't
> > be set. move_tasks() will iterate cfs_rq::cfs_tasks, that list will also
> > include the current running task.
>
> Hmm.. really? Because while dequeueing a task from the rq so as to
> schedule it on a cpu, we delete its entry from the list of cfs_tasks on
> the rq.
>
> list_del_init(&se->group_node) in account_entity_dequeue() does that.

But set_next_entity() doesn't call account_entity_dequeue(), only
__dequeue_entity() to take it out of the rb-tree.

> > And can_migrate_task() only checks for current after the pinning bits.
> >
> >> Continuing with the above explanation; when LBF_ALL_PINNED flag is
> >> set,and we jump to out_balanced, we clear the imbalance flag for the
> >> sched_group comprising of cpu0 and cpu1,although there is actually an
> >> imbalance. t2 could still be migrated to say cpu2/cpu3 (t2 has them in
> >> its cpus allowed mask) in another sched group when load balancing is
> >> done at the next sched domain level.
> >
> > And this is where Vince is wrong; note how
> > update_sg_lb_stats()/sg_imbalance() uses group->sgc->imbalance, but
> > load_balance() sets: sd_parent->groups->sgc->imbalance, so explicitly
> > one level up.
>
> One level up? The group->sgc->imbalance flag is checked during
> update_sg_lb_stats(). This flag is *set during the load balancing at a
> lower level sched domain*.IOW, when the 'group' formed the sched domain.

sd_parent is one level up.

> >
> > So what we can do I suppose is clear 'group->sgc->imbalance' at
> > out_balanced.
>
> You mean 'set'? If we clear it we will have no clue about imbalances at
> lower level sched domains due to pinning. Specifically in LBF_ALL_PINNED
> case. This might prevent us from balancing out these tasks to other
> groups at a higher level domain. update_sd_pick_busiest() specifically
> relies on this flag to choose the busiest group.

No, clear, in load_balance. So set one level up, clear the current
level.

Attachment: pgpvMKFnLX69T.pgp
Description: PGP signature