Re: [PATCH RT 8/8] sched: Lazy migrate_disable processing

From: Sebastian Andrzej Siewior
Date: Tue Sep 17 2019 - 12:50:36 EST


On 2019-07-27 00:56:38 [-0500], Scott Wood wrote:
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 885a195dfbe0..0096acf1a692 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -939,17 +893,34 @@ static int takedown_cpu(unsigned int cpu)
> */
> irq_lock_sparse();
>
> -#ifdef CONFIG_PREEMPT_RT_FULL
> - __write_rt_lock(cpuhp_pin);
> +#ifdef CONFIG_PREEMPT_RT_BASE
> + WARN_ON_ONCE(takedown_cpu_task);
> + takedown_cpu_task = current;
> +
> +again:
> + for (;;) {
> + int nr_pinned;
> +
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + nr_pinned = cpu_nr_pinned(cpu);
> + if (nr_pinned == 0)
> + break;
> + schedule();
> + }

we used to have cpuhp_pin which ensured that once we own the write lock
there will be no more tasks that can enter a migrate_disable() section
on this CPU. It has been placed fairly late to ensure that nothing new
comes in as part of the shutdown process and that it flushes everything
out that is still in a migrate_disable() section.
Now you claim that once the counter reached zero it never increments
again. I would be happier if there was an explicit check for that :)
There is no back off and flush mechanism which means on a busy CPU (as
in heavily lock contended by multiple tasks) this will wait until the
CPU gets idle again.

> + set_current_state(TASK_RUNNING);
> #endif

Sebastian