Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCUimplementation
From: Paul E. McKenney
Date: Sat Aug 30 2008 - 10:34:52 EST
On Sat, Aug 30, 2008 at 03:32:36PM +0200, Manfred Spraul wrote:
> Lai Jiangshan wrote:
>> I just had a fast review. so my comments is nothing but cleanup.
>>
>> Thanks, Lai.
>>
>> Paul E. McKenney wrote:
>>
>>> Hello!
>>>
>>
>>
>>> +rcu_start_gp(struct rcu_state *rsp, unsigned long iflg)
>>> + __releases(rsp->rda[smp_processor_id()]->lock)
>>> +{
>>> + unsigned long flags = iflg;
>>> + struct rcu_data *rdp = rsp->rda[smp_processor_id()];
>>> + struct rcu_node *rnp = rcu_get_root(rsp);
>>> + struct rcu_node *rnp_cur;
>>> + struct rcu_node *rnp_end;
>>> +
>>> + if (!cpu_needs_another_gp(rsp, rdp)) {
>>> /*
>>> - * Accessing nohz_cpu_mask before incrementing rcp->cur needs a
>>> - * Barrier Otherwise it can cause tickless idle CPUs to be
>>> - * included in rcp->cpumask, which will extend graceperiods
>>> - * unnecessarily.
>>> + * Either there is no need to detect any more grace periods
>>> + * at the moment, or we are already in the process of
>>> + * detecting one. Either way, we should not start a new
>>> + * RCU grace period, so drop the lock and return.
>>> */
>>> - smp_mb();
>>> - cpus_andnot(rcp->cpumask, cpu_online_map, nohz_cpu_mask);
>>> + spin_unlock_irqrestore(&rnp->lock, flags);
>>> + return;
>>> + }
>>> +
>>> + /* Advance to a new grace period and initialize state. */
>>> +
>>> + rsp->gpnum++;
>>> + rsp->signaled = RCU_SIGNAL_INIT;
>>> + rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
>>> + record_gp_stall_check_time();
>>> + dyntick_save_completed(rsp, rsp->completed - 1);
>>> + note_new_gpnum(rsp, rdp);
>>> +
>>> + /*
>>> + * Because we are first, we know that all our callbacks will
>>> + * be covered by this upcoming grace period, even the ones
>>> + * that were registered arbitrarily recently.
>>> + */
>>> +
>>> + rcu_next_callbacks_are_ready(rdp);
>>> + rdp->nxttail[RCU_WAIT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
>>> - rcp->signaled = 0;
>>> + /* Special-case the common single-level case. */
>>> +
>>> + if (NUM_RCU_NODES == 1) {
>>> + rnp->qsmask = rnp->qsmaskinit;
>>>
>>
>> I tried a mask like qsmaskinit before. The system came to deadlock
>> when I did on/offline cpus.
>> I didn't find out the whys for I bethought of these two problem:
>>
>> problem 1:
>> ----race condition 1:
>> <cpu_down>
>> synchronize_rcu <called from offline handler in other subsystem>
>> rcu_offline_cpu
>>
>>
>> -----race condition 2:
>> rcu_online_cpu
>> synchronize_rcu <called from online handler in other subsystem>
>> <cpu_up>
>>
>> in these two condition, synchronize_rcu isblocked for ever for
>> synchronize_rcu have to wait a cpu in rnp->qsmask, but this
>> cpu don't run.
>>
>>
> Can we disallow synchronize_rcu() from the cpu notifiers? Are there any
> users that do a synchronize_rcu() from within the notifiers?
> I don't see any other solution.
I made force_quiescent_state() check for offline CPUs. (Well, actually
it is rcu_implicit_offline_qs(), which is indirectly called from
force_quiescent_state().
> Something like qsmaskinit is needed - always enumerating all cpus just
> doesn't scale.
Agreed!!!
> Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet
> how to handle read-side critical sections in CPU_DYING handlers.
> Interrupts after CPU_DYING could be handled by rcu_irq_enter(),
> rcu_irq_exit() [yes, they exist on x86: the arch code enables the local
> interrupts in order to process the currently queued interrupts]
My feeling is that CPU online/offline will be quite rare, so it should
be OK to clean up after the races in force_quiescent_state(), which in
this version is called every three ticks in a given grace period.
Yes, I did worry about the possibility of all CPUs being in dyntick-idle
mode, and the solution for that is (1) don't let a CPU that has RCU
callbacks pending go into dyntick-idle mode via rcu_needs_cpu() and
(2) don't let a grace period start unless there is at least one callback
that is not yet in the done state. But no, I am not certain that I have
gotten this completely correct yet.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/