Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()

From: Yuyang Du
Date: Thu Jul 02 2015 - 23:29:00 EST


Hi Morten,

On Thu, Jul 02, 2015 at 12:40:32PM +0100, Morten Rasmussen wrote:
> detach_tasks() will attempts to pull 62 based on tasks task_h_load() but
> the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will
> empty the src_rq.
>
> IOW, since task groups include blocked load in the load_avg_contrib (see
> __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the
> imbalance includes blocked load and hence env->imbalance >=
> sum(task_h_load(p)) for all tasks p on the rq. Which leads to
> detach_tasks() emptying the rq completely in the reported scenario where
> blocked load > runnable load.

Whenever I want to know the load avg concerning task group, I need to
walk through the complete codes again, I prefer not to do it this time.
But it should not be that simply to say "the 118 comes from the blocked load".

Anyway, with blocked load, yes, we definitely can't move (or even find) some
ammount of the imbalance if we only look at the tasks on the queue. But this
may or may not be a problem.

Firstly, the question comes to whether we want blocked load anywhere.
This is just about a "now vs. average" question.

Secondly, if we stick to average, we just need to treat the blocked load
consistently, not that group SE has it, but task SE does not, or somewhere
has it, others not.

Thanks,
Yuyang

> Whether emptying the src_rq is the right thing to do depends on on your
> point of view. Does balanced load (runnable+blocked) take priority over
> keeping cpus busy or not? For idle_balance() it seems intuitively
> correct to not empty the rq and hence you could consider env->imbalance
> to be too big.
>
> I think we will see more of this kind of problems if we include
> weighted_cpuload() as well. Parts of the imbalance calculation code is
> quite old and could use some attention first.
>
> A short term fix could be what Yuyang propose, stop pulling tasks when
> there is only one left in detach_tasks(). It won't affect active load
> balance where we may want to migrate the last task as it active load
> balance doesn't use detach_tasks().
>
> Morten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/