Re: [PATCH] sched/fair: Update rq_clock, cfs_rq before migrating for asym cpu capacity

From: Peter Zijlstra
Date: Tue Jul 09 2019 - 09:51:02 EST


On Tue, Jul 09, 2019 at 12:57:59PM +0100, Chris Redpath wrote:
> The ancient workaround to avoid the cost of updating rq clocks in the
> middle of a migration causes some issues on asymmetric CPU capacity
> systems where we use task utilization to determine which cpus fit a task.
> On quiet systems we can inflate task util after a migration which
> causes misfit to fire and force-migrate the task.
>
> This occurs when:
>
> (a) a task has util close to the non-overutilized capacity limit of a
> particular cpu (cpu0 here); and
> (b) the prev_cpu was quiet otherwise, such that rq clock is
> sufficiently out of date (cpu1 here).
>
> e.g.
> _____
> cpu0: ________________________| |______________
>
> |<- misfit happens
> ______ ___ ___
> cpu1: ____| |______________|___| |_________|
>
> ->| |<- wakeup migration time
> last rq clock update
>
> When the task util is in just the right range for the system, we can end
> up migrating an unlucky task back and forth many times until we are lucky
> and the source rq happens to be updated close to the migration time.
>
> In order to address this, lets update both rq_clock and cfs_rq where
> this could be an issue.

Can you quantify how much of a problem this really is? It is really sad,
but this is already the second place where we take rq->lock on
migration. We worked so hard to avoid having to acquire it :/

> Signed-off-by: Chris Redpath <chris.redpath@xxxxxxx>
> ---
> kernel/sched/fair.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b798fe7ff7cd..51791db26a2a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6545,6 +6545,21 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> * wakee task is less decayed, but giving the wakee more load
> * sounds not bad.
> */
> + if (static_branch_unlikely(&sched_asym_cpucapacity) &&
> + p->state == TASK_WAKING) {

nit: indent fail.

> + /*
> + * On asymmetric capacity systems task util guides
> + * wake placement so update rq_clock and cfs_rq
> + */
> + struct cfs_rq *cfs_rq = task_cfs_rq(p);
> + struct rq *rq = task_rq(p);
> + struct rq_flags rf;
> +
> + rq_lock_irqsave(rq, &rf);
> + update_rq_clock(rq);
> + update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq);
> + rq_unlock_irqrestore(rq, &rf);
> + }
> remove_entity_load_avg(&p->se);
> }
>
> --
> 2.17.1
>