Re: [PATCH v2 3/3] sched/fair: fix unnecessary increase of balance interval

From: Valentin Schneider
Date: Wed Dec 19 2018 - 06:16:29 EST


On 19/12/2018 08:27, Vincent Guittot wrote:
[...]
>> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top
>> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find
>> at least one task we could pull, even if we don't pull it because of
>> other reasons in can_migrate_task() (e.g. cache hotness).
>>
>> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't
>> increase the balance_interval, which is what we would want to maintain.
>
> But there are several other UC to do active migration and increase the interval
> like all except running tasks are pinned
>

My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases
we care about, although the one you're mentioning is the only one I can
think of. In that case LBF_ALL_PINNED would never be cleared, so when we do
the active balance we'd know it's because all other tasks were pinned so
we should probably increase the interval (see last snippet I sent).

[...]
>> So that's all the need_active_balance() cases except the last
>> sd->nr_balance_failed one. I'd argue this could also be counted as a
>> "good" reason to active balance which shouldn't lead to a balance_interval
>> increase. Plus, it keeps to the logic of increasing the balance_interval
>> only when task affinity gets in the way.
>
> But this is some kind of affinity, the hotness is a way for the
> scheduler to temporarily pinned the task on a cpu to take advantage of
> cache hotness.
>
> I would prefer to be conservative and only reset the interval when we
> are sure that active load balance is really what we want to do.
> Asym packing is one, we can add the misfit case and the move task on
> cpu with more available capacity as well. For the other case, it's
> less obvious and I would prefer to keep current behavior
>

Mmm ok so this one is kinda about semantics on what do we really consider
as "pinning".

If we look at the regular load_balance() path, if all tasks are cache hot
(so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't
increase the balance_interval. Actually, if we have !active_balance we'll
reset it.

Seeing as the duration of a task's cache hotness (default .5ms) is small
compared to any balance_interval (1ms * sd_weight), IMO it would make sense
to reset the interval for all active balance cases. On top of that, we
would keep to the current logic of increasing the balance_interval *only*
when task->cpus_allowed gets in the way.