Re: [RFC PATCH] watchdog: Adding softwatchdog

From: Tetsuo Handa
Date: Sat Apr 24 2021 - 21:08:56 EST


On 2021/04/25 1:19, peter enderborg wrote:
>> I don't think this proposal is a watchdog. I think this proposal is
>> a timer based process killer, based on an assumption that any slowdown
>> which prevents the monitor process from pinging for more than 0.5 seconds
>> (if HZ == 1000) is caused by memory pressure.
>
> You missing the point. The oom killer is a example of a work that it can do.
> it is one policy. The idea is that you should have a policy that fits your needs.

Implementing policy which can run in kernel from timer interrupt context is
quite limited, for it is not allowed to perform operations that might sleep. See

[RFC] memory reserve for userspace oom-killer
https://lkml.kernel.org/r/CALvZod7vtDxJZtNhn81V=oE-EPOf=4KZB2Bv6Giz+u3bFFyOLg@xxxxxxxxxxxxxx

for implementing possibly useful policy.

>
> oom_score_adj is suitable for a android world. But it might be based on
> uid's if your priority is some users over other. Or a memcg. Or as
> Christophe Leroy want the current. The policy is only a example that
> fits a one area.

Horrible idea. Imagine a kernel module that randomly sends SIGTERM/SIGKILL
to "current" thread. How normal systems can survive? A normal system is not
designed to survive random signals.

> You need to describe your prioritization, in android it is
> oom_score_adj. For example I would very much have a policy that sends
> sigterm instead of sigkill.

That's because Android framework is designed to survive random signals
(in order to survive memory pressure situation).

> But the integration with oom is there because
> it is needed. Maybe a bad choice for political reasons but I don't it a
> good idea to hide the intention. Please don't focus on the oom part.

I wonder what system other than Android framework can utilize this module.

By the way, there already is "Software Watchdog" ( drivers/watchdog/softdog.c )
which some people might call it "soft watchdog". It is very confusing to name
your module as "softwatchdog". Please find a different name.