Re: [RFC] memory reserve for userspace oom-killer

From: Shakeel Butt
Date: Wed Apr 21 2021 - 15:18:41 EST


On Wed, Apr 21, 2021 at 11:46 AM <Peter.Enderborg@xxxxxxxx> wrote:
>
> On 4/21/21 8:28 PM, Shakeel Butt wrote:
> > On Wed, Apr 21, 2021 at 10:06 AM peter enderborg
> > <peter.enderborg@xxxxxxxx> wrote:
> >> On 4/20/21 3:44 AM, Shakeel Butt wrote:
> > [...]
> >> I think this is the wrong way to go.
> > Which one? Are you talking about the kernel one? We already talked out
> > of that. To decide to OOM, we need to look at a very diverse set of
> > metrics and it seems like that would be very hard to do flexibly
> > inside the kernel.
> You dont need to decide to oom, but when oom occurs you
> can take a proper action.

No, we want the flexibility to decide when to oom-kill. Kernel is very
conservative in triggering the oom-kill.

> >
[...]
> > Actually no. It is missing the flexibility to monitor metrics which a
> > user care and based on which they decide to trigger oom-kill. Not sure
> > how will watchdog replace psi/vmpressure? Userspace keeps petting the
> > watchdog does not mean that system is not suffering.
>
> The userspace should very much do what it do. But when it
> does not do what it should do, including kick the WD. Then
> the kernel kicks in and kill a pre defined process or as many
> as needed until the monitoring can start to kick and have the
> control.
>

Roman already suggested something similar (i.e. oom-killer core and
extended and core watching extended) but completely in userspace. I
don't see why we would want to do that in the kernel instead.

> >
> > In addition oom priorities change dynamically and changing it in your
> > system seems very hard. Cgroup awareness is missing too.
>
> Why is that hard? Moving a object in a rb-tree is as good it get.
>

It is a group of objects. Anyways that is implementation detail.

The message I got from this exchange is that we can have a watchdog
(userspace or kernel) to further improve the reliability of userspace
oom-killers.