Re: [RFC 0/4] mm, oom: get rid of TIF_MEMDIE

From: Michal Hocko
Date: Fri Sep 16 2016 - 03:15:32 EST


On Thu 15-09-16 10:41:18, Johannes Weiner wrote:
> Hi Michal,
>
> On Thu, Sep 01, 2016 at 11:51:00AM +0200, Michal Hocko wrote:
> > Hi,
> > this is an early RFC to see whether the approach I've taken is acceptable.
> > The series is on top of the current mmotm tree (2016-08-31-16-06). I didn't
> > get to test it so it might be completely broken.
> >
> > The primary point of this series is to get rid of TIF_MEMDIE finally.
> > Recent changes in the oom proper allows for that finally, I believe. Now
> > that all the oom victims are reapable we are no longer depending on
> > ALLOC_NO_WATERMARKS because the memory held by the victim is reclaimed
> > asynchronously. A partial access to memory reserves should be sufficient
> > just to guarantee that the oom victim is not starved due to other
> > memory consumers. This also means that we do not have to pretend to be
> > conservative and give access to memory reserves only to one thread from
> > the process at the time. This is patch 1.
> >
> > Patch 2 is a simple cleanup which turns TIF_MEMDIE users to tsk_is_oom_victim
> > which is process rather than thread centric. None of those callers really
> > requires to be thread aware AFAICS.
> >
> > The tricky part then is exit_oom_victim vs. oom_killer_disable because
> > TIF_MEMDIE acted as a token there so we had a way to count threads from
> > the process. It didn't work 100% reliably and had it own issues but we
> > have to replace it with something which doesn't rely on counting threads
> > but rather find a moment when all threads have reached steady state in
> > do_exit. This is what patch 3 does and I would really appreciate if Oleg
> > could double check my thinking there. I am also CCing Al on that one
> > because I am moving exit_io_context up in do_exit right before exit_notify.
>
> You're explaining the mechanical thing you are doing, but I'm having
> trouble understanding why you want to get rid of TIF_MEMDIE. For one,
> it's more code. And apparently, it's also more complicated than what
> we have right now.
>
> Can you please explain in the cover letter what's broken/undesirable?

Sure, I will extend the cover when submitting the series again. This RFC
was mostly aimed at correctness so I focused more on technical details.
Patch 1 should contain some reasoning. Do you find it sufficient or I
should extend on top of that?

Thanks!

--
Michal Hocko
SUSE Labs