Re: [PATCH] fix rcu vs hotplug race
From: Gautham R Shenoy
Date: Fri Jun 27 2008 - 00:46:27 EST
On Thu, Jun 26, 2008 at 08:27:28AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
> >
> > * Gautham R Shenoy <ego@xxxxxxxxxx> wrote:
> >
> > > > hm, not sure - we might just be fighting the symptom and we might
> > > > now create a silent resource leak instead. Isnt a full RCU quiescent
> > > > state forced (on all CPUs) before a CPU is cleared out of
> > > > cpu_online_map? That way the to-be-offlined CPU should never
> > > > actually show up in rcp->cpumask.
> > >
> > > No, this does not happen currently. The rcp->cpumask is always
> > > initialized to cpu_online_map&~nohz_cpu_mask when we start a new
> > > batch. Hence, before the batch ends, if a cpu goes offline we _can_
> > > have a stale rcp->cpumask, till the RCU subsystem has handled it's
> > > CPU_DEAD notification.
> > >
> > > Thus for a tiny interval, the rcp->cpumask would contain the offlined
> > > CPU. One of the alternatives is probably to handle this using
> > > CPU_DYING notifier instead of CPU_DEAD where we can call
> > > __rcu_offline_cpu().
> > >
> > > The warn_on that dhaval was hitting was because of some cpu-offline
> > > that was called just before we did a local_irq_save inside call_rcu().
> > > But at that time, the rcp->cpumask was still stale, and hence we ended
> > > up sending a smp_reschedule() to an offlined cpu. So the check may not
> > > create any resource leak.
> >
> > the check may not - but the problem it highlights might and with the
> > patch we'd end up hiding potential problems in this area.
> >
> > Paul, what do you think about this mixed CPU hotplug plus RCU workload?
>
> RCU most certainly needs to work correctly in face of arbitrary sequences
> of CPU-hotplug events, and should therefore be tested with arbitrary
> CPU-hotplug tests. And RCU also most certainly needs to refrain from
> issuing spurious warning messages that might over time be ignored,
> possibly causing someone to miss a real bug. My concern with this patch
> is in the second spurious-warning area.
IMHO the warning is a spurious one.
Here's the timeline.
CPU_A CPU_B
--------------------------------------------------------------
cpu_down(): .
. .
. .
stop_machine(): /* disables preemption, .
* and irqs */ .
. .
. .
take_cpu_down(); .
. .
. .
. .
cpu_disable(); /*this removes cpu .
*from cpu_online_map .
*/ .
. .
. .
restart_machine(); /* enables irqs */ .
------WINDOW DURING WHICH rcp->cpumask is stale ---------------
. call_rcu();
. /* disables irqs here */
. .force_quiescent_state();
.CPU_DEAD: .for_each_cpu(rcp->cpumask)
. . smp_send_reschedule();
. .
. . WARN_ON() for offlined CPU!
.
.
.
rcu_cpu_notify:
.
-------- WINDOW ENDS ------------------------------------------
rcu_offline_cpu() /* Which calls cpu_quiet()
* which removes
* cpu from rcp->cpumask.
*/
If a new batch was started just before calling stop_machine_run(), the
"tobe-offlined" cpu is still present in rcp-cpumask.
During a cpu-offline, from take_cpu_down(),
we queue an rt-prio idle task
as the next task to be picked by the scheduler.
We also call cpu_disable() which will disable any further
interrupts and remove the cpu's bit from the cpu_online_map.
Once the stop_machine_run() successfully calls take_cpu_down(),
it calls schedule(). That's the last time a schedule is called
on the offlined cpu, and hence the last time when rdp->passed_quiesc
will be set to 1 through rcu_qsctr_inc().
But the cpu_quiet() will be on this cpu will be called only when the
next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
is still set in rcp->cpumask.
Now coming back to the idle_task which truely offlines the CPU, it does
check for a pending RCU and raises the softirq, since it will find
rdp->passed_quiesc to be 0 in this case. However, since the cpu is offline
I am not sure if the softirq will trigger on the CPU.
Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
is not the same as rcp->cur, which means that our cpu could be holding
up the grace period progression. Hence we call cpu_quiet() and move
ahead.
But because of the window explained in the timeline, we could still have
a call_rcu() before the RCU subsystem executes it's CPU_DEAD
notification, and we send smp_send_reschedule() to offlined cpu while
trying to force the quiescent states. The appended patch adds comments
and prevents checking for offlined cpu everytime.
Author: Gautham R Shenoy <ego@xxxxxxxxxx>
Date: Fri Jun 27 09:33:55 2008 +0530
cpu_online_map is updated by the _cpu_down()
using stop_machine_run(). Since force_quiescent_state is invoked from
irqs disabled section, stop_machine_run() won't be executing while
a cpu is executing force_quiescent_state(). Hence the
cpu_online_map is stable while we're in the irq disabled section.
However, a cpu might have been offlined _just_ before
we disabled irqs while entering force_quiescent_state().
And rcu subsystem might not yet have handled the CPU_DEAD
notification, leading to the offlined cpu's bit
being set in the rcp->cpumask.
Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
sending smp_reschedule() to an offlined CPU.
Signed-off-by: Gautham R Shenoy <ego@xxxxxxxxxx>
Cc: Dhaval Giani <dhaval@xxxxxxxxxxxxxxxxxx>
diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index f4ffbd0..a38895a 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -89,8 +89,22 @@ static void force_quiescent_state(struct rcu_data *rdp,
/*
* Don't send IPI to itself. With irqs disabled,
* rdp->cpu is the current cpu.
+ *
+ * cpu_online_map is updated by the _cpu_down()
+ * using stop_machine_run(). Since we're in irqs disabled
+ * section, stop_machine_run() is not exectuting, hence
+ * the cpu_online_map is stable.
+ *
+ * However, a cpu might have been offlined _just_ before
+ * we disabled irqs while entering here.
+ * And rcu subsystem might not yet have handled the CPU_DEAD
+ * notification, leading to the offlined cpu's bit
+ * being set in the rcp->cpumask.
+ *
+ * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
+ * sending smp_reschedule() to an offlined CPU.
*/
- cpumask = rcp->cpumask;
+ cpus_and(cpumask, rcp->cpumask, cpu_online_map);
cpu_clear(rdp->cpu, cpumask);
for_each_cpu_mask(cpu, cpumask)
smp_send_reschedule(cpu);
>
> Not sure I answered the actual question, though...
>
> Thanx, Paul
--
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/