Re: [v7 5/5] mm, oom: cgroup v2 mount option to disable cgroup-aware OOM killer

From: Christopher Lameter
Date: Fri Sep 08 2017 - 17:09:28 EST


On Thu, 7 Sep 2017, David Rientjes wrote:

> > It has *nothing* to do with zillions of tasks. Its amusing that the SGI
> > ghost is still haunting the discussion here. The company died a couple of
> > years ago finally (ok somehow HP has an "SGI" brand now I believe). But
> > there are multiple companies that have large NUMA configurations and they
> > all have configurations where they want to restrict allocations of a
> > process to subset of system memory. This is even more important now that
> > we get new forms of memory (NVDIMM, PCI-E device memory etc). You need to
> > figure out what to do with allocations that fail because the *allowed*
> > memory pools are empty.
> >
>
> We already had CONSTRAINT_CPUSET at the time, this was requested by Paul
> and acked by him in https://marc.info/?l=linux-mm&m=118306851418425.

Ok. Certainly there were scalability issues (lots of them) and the sysctl
may have helped there if set globally. But the ability to kill the
allocating tasks was primarily used in cpusets for constrained allocation.

The issue of scaling is irrelevant in the context of deciding what to do
about the sysctl. You can address the issue differently if it still
exists. The systems with super high NUMA nodes (hundreds to a
thousand) have somehow fallen out of fashion a bit. So I doubt that this
is still an issue. And no one of the old stakeholders is speaking up.

What is the current approach for an OOM occuring in a cpuset or cgroup
with a restricted numa node set?