Re: [RFC PATCH 1/2] mm, oom: marks all killed tasks as oom victims

From: Michal Hocko
Date: Mon Oct 22 2018 - 07:12:33 EST


On Mon 22-10-18 19:56:49, Tetsuo Handa wrote:
> On 2018/10/22 19:43, Michal Hocko wrote:
> > On Mon 22-10-18 18:42:30, Tetsuo Handa wrote:
> >> On 2018/10/22 17:48, Michal Hocko wrote:
> >>> On Mon 22-10-18 16:58:50, Tetsuo Handa wrote:
> >>>> Michal Hocko wrote:
> >>>>> --- a/mm/oom_kill.c
> >>>>> +++ b/mm/oom_kill.c
> >>>>> @@ -898,6 +898,7 @@ static void __oom_kill_process(struct task_struct *victim)
> >>>>> if (unlikely(p->flags & PF_KTHREAD))
> >>>>> continue;
> >>>>> do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, PIDTYPE_TGID);
> >>>>> + mark_oom_victim(p);
> >>>>> }
> >>>>> rcu_read_unlock();
> >>>>>
> >>>>> --
> >>>>
> >>>> Wrong. Either
> >>>
> >>> You are right. The mm might go away between process_shares_mm and here.
> >>> While your find_lock_task_mm would be correct I believe we can do better
> >>> by using the existing mm that we already have. I will make it a separate
> >>> patch to clarity.
> >>
> >> Still wrong. p->mm == NULL means that we are too late to set TIF_MEMDIE
> >> on that thread. Passing non-NULL mm to mark_oom_victim() won't help.
> >
> > Why would it be too late? Or in other words why would this be harmful?
> >
>
> Setting TIF_MEMDIE after exit_mm() completed is too late.

You are right and I am obviously dense today. I will go with
find_lock_task_mm for now and push the "get rid of TIF_MEMDIE" up in the
todo list. I hope I will get to it some day.
--
Michal Hocko
SUSE Labs