Re: [RFC PATCH] watchdog: Adding softwatchdog

From: peter enderborg
Date: Sat Apr 24 2021 - 12:20:03 EST


On 4/24/21 5:23 PM, Tetsuo Handa wrote:
> On 2021/04/24 23:41, Guenter Roeck wrote:
>> On 4/24/21 3:25 AM, Peter Enderborg wrote:
>>> This is not a rebooting watchdog. It's function is to take other
>>> actions than a hard reboot. On many complex system there is some
>>> kind of manager that monitor and take action on slow systems.
>>> Android has it's lowmemorykiller (lmkd), desktops has earlyoom.
>>> This watchdog can be used to help monitor to preform some basic
>>> action to keep the monitor running.
>>>
>>> It can also be used standalone. This add a policy that is
>>> killing the process with highest oom_score_adj and using
>>> oom functions to it quickly. I think it is a good usecase
>>> for the patch. Memory siuations can be problematic for
>>> software that monitor system, but other prolicys can
>>> should also be possible. Like picking tasks from a memcg, or
>>> specific UID's or what ever is low priority.
>>> ---
>> NACK. Besides this not following the new watchdog API, the task
>> of a watchdog is to reset the system on failure. Its task is most
>> definitely not to re-implement the oom killer in any way, shape,
>> or form.
>>
> I don't think this proposal is a watchdog. I think this proposal is
> a timer based process killer, based on an assumption that any slowdown
> which prevents the monitor process from pinging for more than 0.5 seconds
> (if HZ == 1000) is caused by memory pressure.

You missing the point. The oom killer is a example of a work that it can do.
it is one policy. The idea is that you should have a policy that fits your needs.

oom_score_adj is suitable for a android world. But it might be based on
uid's if your priority is some users over other.  Or a memcg. Or as
Christophe Leroy want the current. The policy is only a example that
fits a one area. You need to describe your prioritization, in android it is
oom_score_adj. For example I would very much have a policy that sends
sigterm instead of sigkill. But the integration with oom is there because
it is needed. Maybe a bad choice for political reasons but I don't it a
good idea to hide the intention. Please don't focus on the oom part.