Re: [GIT PULL rcu/next] rcu commits for 2.6.40

From: Paul E. McKenney
Date: Sat May 14 2011 - 10:49:40 EST


On Fri, May 13, 2011 at 05:07:44PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, May 13, 2011 at 03:12:18PM +0200, Ingo Molnar wrote:
> > >
> > > * Ingo Molnar <mingo@xxxxxxx> wrote:
> > >
> > > > I started bisecting this, and the two relevant endpoints:
> > > >
> > > > bad: 11c476f: net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
> > > > good: 0ee5623f: Linux 2.6.39-rc6
> > > >
> > > > very clearly indicate that this is an RCU regression.
> > >
> > > This might be the same one Yinghai found:
> > >
> > > e59fb3120bec: rcu: Decrease memory-barrier usage based on semi-formal proof
> > >
> > > So with the config i sent it's definitely reproducible.
> > >
> > > At first sight couldnt this be related not to barriers, but to not setting
> > > need_resched() like we did before?
> >
> > Thank you both!!! I had inspected the commit, but missed the fact that
> > the new version refuses to call set_need_resched() if irqs are enabled. :-(
> > The following (untested) patch restores the set_need_resched() operation.
>
> Btw., in hindsight, e59fb3120bec was a tad big, which made analysis harder.
>
> Would it have been possible to split it in two, one for the movement of the
> notifiers, the other for the barrier changes?
>
> That way the bisection would have fingered the movement commit. Or so.

In hindsight, that certainly would have been better.

> > Does this help?
>
> No, unfortunately not, the long delay is still there:
>
> device: 'ttyS0': device_add
> PM: Adding info for No Bus:ttyS0
> INFO: rcu_sched_state detected stalls on CPUs/tasks: { 0} (detected by 1, t=6002 jiffies)

I was afraid of that...

On the off-chance that moving the memory barriers was at fault,
the following patch restores all of them that don't have in situ
replacements. Grasping at straws, admittedly.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 8c490ef..a4a2ef0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1449,10 +1449,12 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
*/
static void rcu_process_callbacks(void)
{
+ smp_mb();
__rcu_process_callbacks(&rcu_sched_state,
&__get_cpu_var(rcu_sched_data));
__rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data));
rcu_preempt_process_callbacks();
+ smp_mb();

/* If we are last CPU on way to dyntick-idle mode, accelerate it. */
rcu_needs_cpu_flush();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/