Re: [ELISA Safety Architecture WG] [PATCH v2 0/2] Introduce the pkill_on_warn parameter

From: James Bottomley
Date: Tue Nov 16 2021 - 08:21:06 EST


On Tue, 2021-11-16 at 09:41 +0100, Petr Mladek wrote:
[...]
> If I wanted to implement a super-reliable panic() I would
> use some external device that would cause power-reset when
> the watched device is not responding.

They're called watchdog timers. We have a whole subsystem full of
them:

drivers/watchdog

We used them in old cluster HA systems to guarantee successful recovery
of shared state from contaminated cluster members, but I think they'd
serve the reliable panic need equally well. Most server class systems
today have them built in (on the BMC if they don't have a separate
mechanism), they're just not usually activated.

James