Re: recent -git: BUG in free_thread_xstate

From: Paul E. McKenney
Date: Fri Aug 08 2008 - 16:41:04 EST


On Fri, Aug 08, 2008 at 08:46:21PM +0200, Vegard Nossum wrote:
> On Fri, Aug 1, 2008 at 11:10 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Wed, Jul 23, 2008 at 01:31:09PM -0700, Suresh Siddha wrote:
> >> On Wed, Jul 23, 2008 at 01:07:04PM -0700, Vegard Nossum wrote:
> >> > Hi,
> >> >
> >> > I just got this on c010b2f76c3032e48097a6eef291d8593d5d79a6 (-git from
> >> > yesterday):
> >>
> >> Do you see this in 2.6.26 aswell? I suspect it is coming from post 2.6.26
> >> changes.
> >>
> >> >
> >> > BUG: unable to handle kernel paging request at 00664381
> >> > IP: [<c010b274>] free_thread_xstate+0x4/0x30
> >> ...
> >>
> >> > EIP is at arch/x86/kernel/process.c:36:
> >> >
> >> > if (tsk->thread.xstate) {
> >> >
> >>
> >> It looks like the kernel stack of that process got corrupted, corrupting the
> >> task pointer in thread_info. Can you send us your config file?
> >
> > I would also like to see the config file.
>
> Hi,
>
> I'm sorry for the late reply.
>
> I copied you because I saw some RCU entry in the stack trace, but it
> is almost definitely not a problem with (core or "leaf") RCU code.
> Sometimes it also happens that people will say "oh, I recognize this
> problem, the patch has been posted here and here", etc.
>
> It seems to be a problem with either netpoll, netconsole, or the
> 8139too driver. I find a UDP packet in the task_struct slab, and the
> stacktrace with RCU entries come from unrelated, unfortunate callbacks
> that stumbled upon the corruption.
>
> My config, if you are still interested, can be found here:
> http://userweb.kernel.org/~vegard/bugs/20080724-fork/config
>
> I don't know if the problem persists with the latest -git, it is now a
> while since I last tested, but I've checked kernels back to 2.6.20, so
> the problem has existed for a long time.

Well, the config shows preemptable RCU, which was my concern at the time,
but there was certainly no preemptable RCU in mainline in 2.6.20, so...

There -was- a bug in 2.6.26 release candidates that would cause RCU
to fail badly on !HOTPLUG_CPU builds due to a failure to initialize,
but that is fixed in 2.6.26 (thank you, Nick!!!).

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/