Re: [GIT PULL rcu/next] rcu commits for 2.6.40

From: Paul E. McKenney
Date: Sun May 15 2011 - 02:02:08 EST


On Sat, May 14, 2011 at 10:41:29PM -0700, Yinghai Lu wrote:
> On 05/14/2011 09:14 PM, Yinghai Lu wrote:
> > On 05/14/2011 11:34 AM, Paul E. McKenney wrote:
> >>> and do the inspection afterwards.
> >>
> >> And here is a lightly-tested patch, which applies on tip/core/rcu.
> >>
> >> This problem could account for both the long delays seen with e59fb312
> >> (Decrease memory-barrier usage based on semi-formal proof) and the
> >> shorter delays seen with a26ac245 (move TREE_RCU from softirq to kthread).
> >
> > yes. it fixes the problem.
> >
> > for 1024g system when hotadd mem enabled in kernel config
> >
> > [ 31.814803] cpu_dev_init done
> > [ 35.437163] memory_dev_init done
> >
> > even it is with gcc from opensuse 11.3
>
> got:
>
> [ 86.931217] Switched to NOHz mode on CPU #0
> [ 86.931272] Switched to NOHz mode on CPU #25
> [ 86.931278] ------------[ cut here ]------------
> [ 86.931290] WARNING: at kernel/rcutree.c:364 rcu_enter_nohz+0x44/0x76()
> [ 86.931294] Hardware name: Sun Fire X4800 M2
> [ 86.931297] Modules linked in:
> [ 86.931303] Pid: 0, comm: swapper Not tainted 2.6.39-rc7-tip-yh-04836-g5e42dc2-dirty #3
> [ 86.931307] Call Trace:
> [ 86.931333] [<ffffffff81080280>] warn_slowpath_common+0x85/0x9d
> [ 86.931338] Switched to NOHz mode on CPU #74
> [ 86.931346] [<ffffffff810802b2>] warn_slowpath_null+0x1a/0x1c
> [ 86.931356] [<ffffffff810d3615>] rcu_enter_nohz+0x44/0x76
> [ 86.931370] [<ffffffff810ab3cb>] tick_nohz_stop_sched_tick+0x27d/0x366
> [ 86.931381] [<ffffffff810391bc>] cpu_idle+0x7a/0xcc
> [ 86.931397] [<ffffffff81bd1aa3>] rest_init+0xb7/0xbe
> [ 86.931408] [<ffffffff81bd19ec>] ? csum_partial_copy_generic+0x16c/0x16c
> [ 86.931423] [<ffffffff82738e39>] start_kernel+0x3b2/0x3bd
> [ 86.931428] Switched to NOHz mode on CPU #94
> [ 86.931436] [<ffffffff827382cc>] x86_64_start_reservations+0x9c/0xa0
> [ 86.931446] [<ffffffff827384a8>] x86_64_start_kernel+0x1d8/0x1e3
> [ 86.931463] ---[ end trace 2cfc591bf7de931f ]---
> [ 86.931598] Switched to NOHz mode on CPU #151
> [ 86.931613] Switched to NOHz mode on CPU #152

As I expected!

There is a dyntick entry/exit mismatch somewhere. I haven't yet been able
to find it by inspection, and I cannot reproduce on the systems that I
have access to.

One way that this could happen is if the interrupt-exit code on your
architecture sometimes failed to call irq_exit().

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/