Re: can't oom-kill zap the victim's memory?

From: Michal Hocko
Date: Fri Oct 02 2015 - 08:36:47 EST


On Tue 29-09-15 01:18:00, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > The point I've tried to made is that oom unmapper running in a detached
> > context (e.g. kernel thread) vs. directly in the oom context doesn't
> > make any difference wrt. lock because the holders of the lock would loop
> > inside the allocator anyway because we do not fail small allocations.
>
> We tried to allow small allocations to fail. It resulted in unstable system
> with obscure bugs.

Have they been reported/fixed? All kernel paths doing an allocation are
_supposed_ to check and handle ENOMEM. If they are not then they are
buggy and should be fixed.

> We tried to allow small !__GFP_FS allocations to fail. It failed to fail by
> effectively __GFP_NOFAIL allocations.

What do you mean by that? An opencoded __GFP_NOFAIL?

> We are now trying to allow zapping OOM victim's mm. Michal is already
> skeptical about this approach due to lock dependency.

I am not sure where this came from. I am all for this approach. It will
not solve the problem completely for sure but it can help in many cases
already.

> We already spent 9 months on this OOM livelock. No silver bullet yet.
> Proposed approaches are too drastic to backport for existing users.
> I think we are out of bullet.

Not at all. We have this problem since ever basically. And we have a lot
of legacy issues to care about. But nobody could reasonably expect this
will be solved in a short time period.

> Until we complete adding/testing __GFP_NORETRY (or __GFP_KILLABLE) to most
> of callsites,

This is simply not doable. There are thousand of allocation sites all
over the kernel.

> timeout based workaround will be the only bullet we can use.

Those are the last resort which only paper over real bugs which should
be fixed. I would agree with your urging if this was something that can
easily happen on a _properly_ configured system. System which can blow
into an OOM storm is far from being configured properly. If you have an
untrusted users running on your system you should better put them into a
highly restricted environment and limit as much as possible.

I can completely understand your frustration about the pace of the
progress here but this is nothing new and we should strive for long term
vision which would be much less fragile than what we have right now. No
timeout based solution is the way in that direction.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/