Re: [PATCH] doc: cgroup: update note about conditions when oom killer is invoked

From: Michal Hocko
Date: Mon May 11 2020 - 06:13:30 EST


On Mon 11-05-20 12:34:00, Konstantin Khlebnikov wrote:
>
>
> On 11/05/2020 11.39, Michal Hocko wrote:
> > On Fri 08-05-20 17:16:29, Konstantin Khlebnikov wrote:
> > > Starting from v4.19 commit 29ef680ae7c2 ("memcg, oom: move out_of_memory
> > > back to the charge path") cgroup oom killer is no longer invoked only from
> > > page faults. Now it implements the same semantics as global OOM killer:
> > > allocation context invokes OOM killer and keeps retrying until success.
> > >
> > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
> >
> > Acked-by: Michal Hocko <mhocko@xxxxxxxx>
> >
> > > ---
> > > Documentation/admin-guide/cgroup-v2.rst | 17 ++++++++---------
> > > 1 file changed, 8 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index bcc80269bb6a..1bb9a8f6ebe1 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1172,6 +1172,13 @@ PAGE_SIZE multiple when read back.
> > > Under certain circumstances, the usage may go over the limit
> > > temporarily.
> > > + In default configuration regular 0-order allocation always
> > > + succeed unless OOM killer choose current task as a victim.
> > > +
> > > + Some kinds of allocations don't invoke the OOM killer.
> > > + Caller could retry them differently, return into userspace
> > > + as -ENOMEM or silently ignore in cases like disk readahead.
> >
> > I would probably add -EFAULT but the less error codes we document the
> > better.
>
> Yeah, EFAULT was a most obscure result of memory shortage.
> Fortunately with new behaviour this shouldn't happens a lot.

Yes, it shouldn't really happen very often. gup was the most prominent
example but this one should be taken care of by triggering the OOM
killer. But I wouldn't bet my hat there are no potential cases anymore.

> Actually where it is still possible? THP always fallback to 0-order.
> I mean EFAULT could appear inside kernel only if task is killed so
> nobody would see it.

Yes fatal_signal_pending paths are ok. And no I do not have any specific
examples. But as you've said EFAULT was a real surprise so I thought it
would be nice to still keep a reference for it around. Even when it is
unlikely.

--
Michal Hocko
SUSE Labs