Re: [PATCH] Fix tasks being forgotten for a long time on SMP

From: Peter Zijlstra
Date: Thu Sep 22 2016 - 03:34:19 EST


On Tue, Sep 20, 2016 at 06:14:34PM -0700, Yuriy Romanenko wrote:
> From e9a304ae91fa2a4427bde7d3aea18296d0ebb27f Mon Sep 17 00:00:00 2001
> From: Yuriy Romanenko <yromanenko@xxxxxxxxxxxxxx>
> Date: Tue, 20 Sep 2016 17:47:28 -0700
> Subject: [PATCH] Fix tasks being forgotten for a long time on SMP
>
> Observed occasional very high latency on an embedded SMP system between
> a task becoming ready to run and actually running with low system load,
> impacting interactive usage.
>
> A sched_wake() from CPUx on CPUy puts the task into the run queue and
> marks it runnable, but does not trigger an IPI to have the scheduler
> re-run on CPUy and see if the current task needs to get pre-empted and
> does not wake up CPUy if it is asleep.
>
> This is especially evident when a CPU is in SWFI and simply does not
> wake up even though it now has a runnable task.

WTH his SWFI and which arch has that?

> This is probably not the most elegant fix and definitely generates some
> unnecessary scheduler runs, but it's better for overall latency.
>
> Signed-off-by: Yuriy Romanenko <yromanenko@xxxxxxxxxxxxxx>
> ---
> kernel/sched/core.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 860070f..7c334b7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1686,6 +1686,14 @@ static void ttwu_do_wakeup(struct rq *rq, struct
> task_struct *p, int wake_flags,
> trace_sched_wakeup(p);
>
> #ifdef CONFIG_SMP
> + /*
> + * If the task is not on the current cpu, there is a chance
> + * the other cpu might be asleep and will not get to our task
> + * for a really long time. Send an IPI to avoid that
> + */
> + if (task_cpu(p) != smp_processor_id())
> + smp_send_reschedule(task_cpu(p));
> +

Yeah, no, this is completely insane.

If the remote cpu is running the idle task, check_preempt_curr() should
very much wake it up, if its not the idle class, it should never get
there because there is now an actually runnable task on.

Please explain in detail what happens and/or provide traces of this
happening.