Re: [PATCH] mm: oom_kill_process: do not abort if the victim is exiting
From: Michal Hocko
Date: Wed May 25 2016 - 04:09:54 EST
On Tue 24-05-16 20:07:46, Vladimir Davydov wrote:
> On Tue, May 24, 2016 at 03:50:42PM +0200, Michal Hocko wrote:
[...]
> > It is not really pointless. The original intention was to not spam the
> > log and alarm the administrator when in fact the memory hog is exiting
> > already and will free the memory.
>
> IMO the fact that a process, even an exiting one enters oom, is
> abnormal, indicates that the system is misconfigured, and hence should
> be reported to the admin.
>
> > Those races is quite unlikely but not impossible.
>
> If this case is unlikely, how can it spam the log?
The oom report can be quite large, especially on large setups. The
oom_reaper message will be much shorter and will give a clue that
an exceptional action had to be done.
> > The original check was much more optimistic as you said
> > above we have even removed one part of this heuristic. We can still end
> > up selecting an exiting task which is stuck and we could invoke the oom
> > reaper for it without excessive oom report. I agree that the current
> > check is still little bit optimistic but processes sharing the mm
> > (CLONE_VM without CLONE_THREAD/CLONE_SIGHAND) are really rare so I
> > wouldn't bother with them with a high priority.
> >
> > That being said I would prefer to keep the check for now. After the
> > merge windlow closes I will send other oom enhancements which I have
> > half baked locally and that should make task_will_free_mem much more
> > reliable and the check would serve as a last resort to reduce oom noise.
>
> I don't agree that a message about oom killing an exiting process is
> noise, because that shouldn't happen on a properly configured system.
> To me this racy check looks more like noise in the kernel code. By the
> time we enter oom we should have scanned lru several times to find no
> reclaimable pages. The system must be really sluggish. What's the point
> in deceiving the admin by suppressing the warning?
Well, my understanding of the OOM report is that it should tell you two
things. The first one is to give you an overview of the overal memory
situation when the system went OOM and the second one is o give you
information that something has been _killed_ and what was the criteria
why it has been selected (points). While the first one might be
interesting for what you write above the second is not and it might be
even misleading because we are not killing anything and the selected
task is dying without the kernel intervention. So I dunno. I do not see
any strong reason to drop these few lines of code which should be a
maintenance burden. task_will_free_mem will need some changes to be more
robust anyway. If you really see a strong reason to drop it because it
would help to debug OOM situation then I won't insist...
--
Michal Hocko
SUSE Labs