Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier

From: Will Deacon
Date: Thu Nov 05 2020 - 17:22:49 EST


On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote:
> On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote:
> > The call to rcu_cpu_starting() in secondary_start_kernel() is not early
> > enough in the CPU-hotplug onlining process, which results in lockdep
> > splats as follows:
> >
> > WARNING: suspicious RCU usage
> > -----------------------------
> > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
> >
> > [...]
>
> Applied to arm64 (for-next/fixes), thanks!
>
> [1/1] arm64/smp: Move rcu_cpu_starting() earlier
> https://git.kernel.org/arm64/c/ce3d31ad3cac

Hmm, this patch has caused a regression in the case that we fail to
online a CPU because it has incompatible CPU features and so we park it
in cpu_die_early(). We now get an endless spew of RCU stalls because the
core will never come online, but is being tracked by RCU. So I'm tempted
to revert this and live with the lockdep warning while we figure out a
proper fix.

What's the correct say to undo rcu_cpu_starting(), given that we cannot
invoke the full hotplug machinery here? Is it correct to call
rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
CPU doing cpu_up(), or should we do something else?

Will