Re: OOM killer not nearly agressive enough?

From: Vito Caputo
Date: Thu Jan 09 2020 - 16:52:46 EST


On Thu, Jan 09, 2020 at 10:03:07PM +0100, Pavel Machek wrote:
> On Thu 2020-01-09 12:56:33, Michal Hocko wrote:
> > On Tue 07-01-20 21:44:12, Pavel Machek wrote:
> > > Hi!
> > >
> > > I updated my userspace to x86-64, and now chromium likes to eat all
> > > the memory and bring the system to standstill.
> > >
> > > Unfortunately, OOM killer does not react:
> > >
> > > I'm now running "ps aux", and it prints one line every 20 seconds or
> > > more. Do we agree that is "unusable" system? I attempted to do kill
> > > from other session.
> >
> > Does sysrq+f help?
>
> May try that next time.
>
> > > Do we agree that OOM killer should have reacted way sooner?
> >
> > This is impossible to answer without knowing what was going on at the
> > time. Was the system threshing over page cache/swap? In other words, is
> > the system completely out of memory or refaulting the working set all
> > the time because it doesn't fit into memory?
>
> Swap was full, so "completely out of memory", I guess. Chromium does
> that fairly often :-(.
>

Have you considered restricting its memory limits a la `ulimit -m`?

I've taken to running browsers in nspawn containers for general
isolation improvements, but this also makes it easy to set cgroup
resource limits like memcg. i.e. --property MemoryMax=2G

This prevents the browser from bogging down the entire system, but it
doesn't prevent thrashing before FF OOMs within its control group.

I do feel there's a problem with the kernel's reclaim algorithm, it
seems far too willing to evict file-backed pages that are recently in
use. But at least with memcg this behavior is isolated to the cgroup,
though it still generates a crapload of disk reads from all the
thrashing.

Regards,
Vito Caputo