Re: WARNING at kernel/sched/core.c:2013 migration_cpu_stop+0x2e3/0x330

From: Peter Zijlstra
Date: Tue Nov 17 2020 - 06:07:06 EST


On Mon, Nov 16, 2020 at 10:00:14AM +0000, Valentin Schneider wrote:
>
> On 15/11/20 22:32, Oleksandr Natalenko wrote:
> > Hi.
> >
> > I'm running v5.10-rc3-rt7 for some time, and I came across this splat in
> > dmesg:
> >
> > ```
> > [118769.951010] ------------[ cut here ]------------
> > [118769.951013] WARNING: CPU: 19 PID: 146 at kernel/sched/core.c:2013
>
> Err, I didn't pick up on this back then, but isn't that check bogus? If the
> task is enqueued elsewhere, it's valid for it not to be affined
> 'here'. Also that is_migration_disabled() check within is_cpu_allowed()
> makes me think this isn't the best thing to call on a remote task.
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1218f3ce1713..47d5b677585f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2010,7 +2010,7 @@ static int migration_cpu_stop(void *data)
> * valid again. Nothing to do.
> */
> if (!pending) {
> - WARN_ON_ONCE(!is_cpu_allowed(p, cpu_of(rq)));
> + WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_ptr));

Ho humm.. bit of a mess that. I'm trying to figure out if we need that
is_per_cpu_kthread() test here or not.

I suppose not, what we want here is to ensure the CPU is in cpus_mask
and not care about the whole hotplug mess.

Would it makes sense to replace both instances in migration_cpu_stop()
with:

WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), p->cpus_mask));

?