Re: [PATCH 0/0] Panic on softdog timeout

From: Anithra P Janakiraman
Date: Thu Jan 20 2011 - 04:10:09 EST


On 01/18/2011 09:22 PM, Américo Wang wrote:
On Tue, Jan 18, 2011 at 06:14:36PM +0530, Anithra P Janakiraman wrote:

Hi.

We currently have no way of determining the reason for failure when a
softdog timeout occurs. At the minimum a snapshot of the system would
help to determine the cause.
The attached patch invokes panic on softdog timeout iff kdump is
configured, if kdump is not configured it works as usual.


We don't do it in this way, check softlockup_panic, we have
a boot parameter, i.e. "softlockup_panic=". :)


Some softdog specific scenarios cannot be handled by a softlockup detector. We use softdog to watch for critical application failures, where it is possible that the application has failed but there isn't a softlockup as such.
For e.g. when doing high availability tests on applications, softdog is setup so that the timer is reset by an application thread. In case of the application failing the timer expires and causes a reboot. In such scenarios some information on what caused the failure would be useful and i don't see how softlockup can be used. The patch i had sent would be useful in these cases. If I am missing something please do let me know.
I will make the modifications as suggested by Dave Hansen and post the patch shortly.

Anithra.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/