Re: [PATCH] mm, oom_adj: avoid meaningless loop to find processes sharing mm

From: Michal Hocko
Date: Tue Oct 09 2018 - 03:50:20 EST


On Tue 09-10-18 08:35:41, Michal Hocko wrote:
> [I have only now noticed that the patch has been reposted]
>
> On Mon 08-10-18 18:27:39, Tetsuo Handa wrote:
> > On 2018/10/08 17:38, Yong-Taek Lee wrote:
[...]
> > > Thank you for your suggestion. But i think it would be better to seperate to 2 issues. How about think these
> > > issues separately because there are no dependency between race issue and my patch. As i already explained,
> > > for_each_process path is meaningless if there is only one thread group with many threads(mm_users > 1 but
> > > no other thread group sharing same mm). Do you have any other idea to avoid meaningless loop ?
> >
> > Yes. I suggest reverting commit 44a70adec910d692 ("mm, oom_adj: make sure processes
> > sharing mm have same view of oom_score_adj") and commit 97fd49c2355ffded ("mm, oom:
> > kill all tasks sharing the mm").
>
> This would require a lot of other work for something as border line as
> weird threading model like this. I will think about something more
> appropriate - e.g. we can take mmap_sem for read while doing this check
> and that should prevent from races with [v]fork.

Not really. We do not even take the mmap_sem when CLONE_VM. So this is
not the way. Doing a proper synchronization seems much harder. So let's
consider what is the worst case scenario. We would basically hit a race
window between copy_signal and copy_mm and the only relevant case would
be OOM_SCORE_ADJ_MIN which wouldn't propagate to the new "thread". OOM
killer could then pick up the "thread" and kill it along with the whole
process group sharing the mm. Well, that is unfortunate indeed and it
breaks the OOM_SCORE_ADJ_MIN contract. There are basically two ways here
1) do not care and encourage users to use a saner way to set
OOM_SCORE_ADJ_MIN because doing that externally is racy anyway e.g.
setting it before [v]fork & exec. Btw. do we know about an actual user
who would care?
2) add OOM_SCORE_ADJ_MIN and do not kill tasks sharing mm and do not
reap the mm in the rare case of the race.

I would prefer the firs but if this race really has to be addressed then
the 2 sounds more reasonable than the wholesale revert.
--
Michal Hocko
SUSE Labs