Re: Inquiry Regarding Handling of Kernel Crashes

From: Dr. David Alan Gilbert
Date: Sun May 12 2024 - 16:50:44 EST


* Muni Sekhar (munisekharrms@xxxxxxxxx) wrote:
> Dear Linux Kernel Community,

Hi,

> I hope this email finds you well. I am currently engaged in testing
> device drivers in Linux kernel mode, and I have encountered various
> types of kernel crashes during my testing process.
>
> Among these, some examples of kernel crashes include OOPS, lockups and others.
>
> I have a few questions regarding the handling of kernel crashes during testing:
>
> When encountering a kernel crash during testing, is it advisable to
> continue testing without rebooting the system? Or is it preferable to
> reboot the system after each kernel crash and then resume testing?

Rebooting is best.

> Can the first kernel crash, whether it is an OOPS, or any other type
> crash, potentially lead to subsequent crashes of the same or different
> types? If so, should debugging efforts focus only on the first kernel
> crash, or should all subsequent crashes also be considered and
> addressed?

Yes - not all failures do that, but some will cause follow on crashes;
looking at the first crash normally gives the most reliable idea
of what went wrong. But keep all the logs, anything might help you figure
it out.

> In the event that the system needs to be rebooted after a kernel
> crash, how can user space test utilities be informed that a kernel
> crash has occurred? Additionally, how can the system be configured to
> automatically reboot in the event of a kernel crash?

See Documentation/admin-guide/kernel-parameters.txt there are
quite a few useful ones, in particular:
oops=panic will cause a panic after an oops
which when you combine it with
panic=30

means an oops will then cause a panic which causes a reboot.

You could also consider using a 'crash kernel' - on a panic
that lands in a fresh kernel that just saves a memory snapshot
that you can then try and debug.

Turning on a watchdog as well is good; some kernel bugs just hang
rather than giving a nice oops.

> I would greatly appreciate any insights or best practices you can
> share regarding the handling of kernel crashes during testing. Your
> expertise and guidance on this matter would be invaluable to my
> testing efforts.
>
> Thank you very much for your time and assistance. I look forward to
> your response.

Good luck!

Dave

>
>
> --
> Thanks,
> Sekhar
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/