Re: Soft lockup during suspend since ~2.6.36

From: Thilo-Alexander Ginkel
Date: Mon Apr 04 2011 - 09:57:42 EST


On Mon, Apr 4, 2011 at 05:02, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Sunday 03 April 2011, Thilo-Alexander Ginkel wrote:
>> Transcript:
>> | RIP: _raw_spin_unlock_irqrestore
>> | Call Stack:
>> | Â _set_cpus_allowed_ptr
>> | Â worker_maybe_bind_and_lock
>> | Â rescuer_thread
>> | Â rescuer_thread
>> | Â kernel_thread_helper
>> | Â kthread
>> | Â kernel_thread_helper
>> | Â kthread
>>
>> After some more time, the following backtrace is printed:
>> Â https://secure.tgbyte.de/dropbox/ueph3Ohm-3.jpg
>>
>> Transcript:
>> | RIP: worker_maybe_bind_and_lock
>> | Call Stack:
>> | Â rescuer_thread
>> | Â rescuer_thread
>> | Â kernel_thread_helper
>> | Â kthread
>> | Â kernel_thread_helper
>> | Â kthread
>>
>> From then on these two backtraces are printed in an alternating fashion.
>>
>> Unfortunately, the top of the output is missing as it does not fit on
>> screen (does someone know an even smaller console font?), but I assume
>> it's the deadlock detection.
>
> The interesting thing is that the backtrace is from unlock, not from lock,
> so it can't really be a deadlock. However, the _raw_spin_unlock_irqrestore
> function calls debug_spin_unlock(), which does a few sanity check. Maybe
> one of those got triggered.
>
> The easiest way to get the full output is usually to attach a serial
> NULL modem cable and redirect the console to that, so you can get the
> output on another machine. Another idea would be to modify the
> show_registers function in arch/x86/kernel/dumpstack_64.c so that
> it prints less data.

Unfortunately, the output via a serial console becomes garbled after
"Entering mem sleep", so I went for patching dumpstack_64.c and a
couple of other source files to reduce the verbosity. I hope not to
have stripped any essential information. The result is available in
these pictures:
https://secure.tgbyte.de/dropbox/IeZalo4t-1.jpg
https://secure.tgbyte.de/dropbox/IeZalo4t-2.jpg

For both traces, the printed error message reads: "BUG: soft lockup -
CPU#3 stuck for 67s! [kblockd:28]"

(After a bit of Googling I understand that a soft lockup is probably
different from a deadlock - please correct me if that assumption is
wrong)

> Yet another idea would be to set /sys/kernel/printk_delay so that the
> oops gets printed slower.

Hm, that file does not exist on my machine. Does it need a special
compile-time config option to be enabled?

Regards,
Thilo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/