Re: [2.6.30-rc1] RCU detected CPU 1 stall

From: Paul E. McKenney
Date: Fri Apr 10 2009 - 11:32:49 EST


On Fri, Apr 10, 2009 at 04:03:53PM +0100, Al Viro wrote:
> On Fri, Apr 10, 2009 at 07:22:03AM -0700, Paul E. McKenney wrote:
>
> > Hmmmm... This indicates that CPU 1 was spinning in the kernel for
> > a long time. At 250 HZ, 32,565 jiffies is 130 seconds, or just over
> > two -minutes-. Ouch!!!
> >
> > The interrupt happened on the stalled CPU, so we know that interrupts
> > were enabled. Because we have CONFIG_PREEMPT_NONE=y, there is no
> > preemption, so preemption need not be disabled. This could be due
> > to lock contention, or even a simple infinite loop.
> >
> > The timer interrupt (apic_timer_interrupt) occurred in either
> > __bprm_mm_init(), __get_user_4(), count(), or do_execve(). There
> > have been some recent changes around check_unsafe_exec() -- any
> > possibility that these introduced excessive lock contention or
> > an infinite loop? Ditto for the recent security fixes?
>
> Oh, joy... the loop in there is this:
> for (t = next_thread(p); t != p; t = next_thread(t)) {
> if (t->fs == p->fs)
> n_fs++;
> }
> I find it hard to believe that it can take two minutes, though.

Tetsuo, how many tasks did you have on this machine?

Though I too find it hard to believe that there were enough to chew up
two minutes. Maybe the list got corrupted so that it has a loop?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/