Re: [PATCH RT] sched: migrate_enable: Busy loop until the migration request is completed

From: Scott Wood
Date: Fri Dec 13 2019 - 01:44:43 EST


On Thu, 2019-12-12 at 12:27 +0100, Sebastian Andrzej Siewior wrote:
> If user task changes the CPU affinity mask of a running task it will
> dispatch migration request if the current CPU is no longer allowed. This
> might happen shortly before a task enters a migrate_disable() section.
> Upon leaving the migrate_disable() section, the task will notice that
> the current CPU is no longer allowed and will will dispatch its own
> migration request to move it off the current CPU.
> While invoking __schedule() the first migration request will be
> processed and the task returns on the "new" CPU with "arg.done = 0". Its
> own migration request will be processed shortly after and will result in
> memory corruption if the stack memory, designed for request, was used
> otherwise in the meantime.

Ugh.

> Spin until the migration request has been processed if it was accepted.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> ---
> kernel/sched/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8bea013b2baf5..5c7be96ca68c4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8227,7 +8227,7 @@ void migrate_enable(void)
>
> WARN_ON(smp_processor_id() != cpu);
> if (!is_cpu_allowed(p, cpu)) {
> - struct migration_arg arg = { p };
> + struct migration_arg arg = { .task = p };
> struct cpu_stop_work work;
> struct rq_flags rf;
>
> @@ -8239,7 +8239,10 @@ void migrate_enable(void)
> stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
> &arg, &work);
> __schedule(true);
> - WARN_ON_ONCE(!arg.done && !work.disabled);
> + if (!work.disabled) {
> + while (!arg.done)
> + cpu_relax();
> + }

We should enable preemption while spinning -- besides the general badness
of spinning with it disabled, there could be deadlock scenarios if
multiple CPUs are spinning in such a loop. Long term maybe have a way to
dequeue the no-longer-needed work instead of waiting.

-Scott