Re: [3.4-rc3] Thread overran stack, or stack corrupted

From: Linus Torvalds
Date: Wed Apr 18 2012 - 13:02:27 EST


On Tue, Apr 17, 2012 at 8:19 PM, Dave Jones <davej@xxxxxxxxxx> wrote:
>
> So now that I'm instrumenting it, it's taking a lot longer to trigger
> (how typical). Almost 6 hours in though, it's down to 72 bytes, and spewed
> the traces below, which look pretty.. deep.

Yeah. Sadly, they are less useful than I was hoping for. It's not some
single deep call-chain, it's almost all debug stuff and the "did we
release the RCU lock" or preemption checks, which I guess makes sense.
You have tons of options enabled in your kernel that makes for deeper
stack traces, and then all the interesting stuff gets overwritten by
what happened later.

For example, it looks like you have the USB serial console enabled,
and some of those deep stack traces are about that - and largely got
overwritten by the "dump_trace()" logic itself. So dump_trace()
printing stuff out also ended up overwriting the stack trace that we
were interested in.

I assume you have USB serial console on for a reason (ie: great for
catching oopses before the machine dies), but in this case it hurts.

Could you try just adding a

console_lock();
...
console_unlock();

around the show_trace() call. That will force the code to not actually
call down to the console layer until after the console_unlock(), so
the printing of the stack trace won't affect the stack *too* much.

That said, if you get the function tracer thing working, that will
give much nicer backtraces.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/