Re: cpu stopper threads and load balancing leads to deadlock

From: Peter Zijlstra
Date: Thu May 03 2018 - 12:45:20 EST


On Thu, May 03, 2018 at 09:12:31AM -0700, Paul E. McKenney wrote:
> On Thu, May 03, 2018 at 04:44:50PM +0200, Peter Zijlstra wrote:
> > On Thu, May 03, 2018 at 04:16:55PM +0200, Mike Galbraith wrote:
> > > On Thu, 2018-05-03 at 15:56 +0200, Peter Zijlstra wrote:
> > > > On Thu, May 03, 2018 at 03:32:39PM +0200, Mike Galbraith wrote:
> > > >
> > > > > Dang. With $subject fix applied as well..
> > > >
> > > > That's a NO then... :-(
> > >
> > > Could say who cares about oddball offline wakeup stat. <cringe>
> >
> > Yeah, nobody.. but I don't want to have to change the wakeup code to
> > deal with this if at all possible. That'd just add conditions that are
> > 'always' false, except in this exceedingly rare circumstance.
> >
> > So ideally we manage to tell RCU that it needs to pay attention while
> > we're doing this here thing, which is what I thought RCU_NONIDLE() was
> > about.
>
> One straightforward approach would be to provide a arch-specific
> Kconfig option that tells notify_cpu_starting() not to bother invoking
> rcu_cpu_starting(). Then x86 selects this Kconfig option and invokes
> rcu_cpu_starting() itself early enough to avoid splats.
>
> See the (untested, probably does not even build) patch below.
>
> I have no idea where to insert either the "select" or the call to
> rcu_cpu_starting(), so I left those out. I know that putting the
> call too early will cause trouble, but I have no idea what constitutes
> "too early". :-/

Something like so perhaps? Mike, can you play around with that? Could
burn your granny and eat your cookies.


diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7468de429087..07360523c3ce 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -793,6 +793,9 @@ void mtrr_ap_init(void)

if (!use_intel() || mtrr_aps_delayed_init)
return;
+
+ rcu_cpu_starting(smp_processor_id());
+
/*
* Ideally we should hold mtrr_mutex here to avoid mtrr entries
* changed, but this routine will be called in cpu boot time,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a734692a581..4dab46950fdb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3775,6 +3775,8 @@ int rcutree_dead_cpu(unsigned int cpu)
return 0;
}

+static DEFINE_PER_CPU(int, rcu_cpu_started);
+
/*
* Mark the specified CPU as being online so that subsequent grace periods
* (both expedited and normal) will wait on it. Note that this means that
@@ -3796,6 +3798,11 @@ void rcu_cpu_starting(unsigned int cpu)
struct rcu_node *rnp;
struct rcu_state *rsp;

+ if (per_cpu(rcu_cpu_started, cpu))
+ return;
+
+ per_cpu(rcu_cpu_started, cpu) = 1;
+
for_each_rcu_flavor(rsp) {
rdp = per_cpu_ptr(rsp->rda, cpu);
rnp = rdp->mynode;
@@ -3852,6 +3859,8 @@ void rcu_report_dead(unsigned int cpu)
preempt_enable();
for_each_rcu_flavor(rsp)
rcu_cleanup_dying_idle_cpu(cpu, rsp);
+
+ per_cpu(rcu_cpu_started, cpu) = 0;
}

/* Migrate the dead CPU's callbacks to the current CPU. */