Re: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task

From: Michal Hocko
Date: Tue Aug 07 2018 - 07:04:10 EST


On Tue 07-08-18 19:15:11, Tetsuo Handa wrote:
[...]
> Of course, if the hard limit is 0, all processes will be killed after all. But
> Michal is ignoring the fact that if the hard limit were not 0, there is a chance
> of saving next process from needlessly killed if we waited until "mm of PID=23766
> completed __mmput()" or "mm of PID=23766 failed to complete __mmput() within
> reasonable period".

This is a completely different issue IMHO. I haven't seen reports about
overly eager memcg oom killing so far.

> We can make efforts not to return false at
>
> /*
> * This task has already been drained by the oom reaper so there are
> * only small chances it will free some more
> */
> if (test_bit(MMF_OOM_SKIP, &mm->flags))
> return false;
>
> (I admit that ignoring MMF_OOM_SKIP for once might not be sufficient for memcg
> case), and we can use feedback based backoff like
> "[PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes." *UNTIL*
> we come to the point where the OOM reaper can always reclaim all memory.

The code is quite tricky and I am really reluctant to make it even more
so without seeing this is really hurting real users with real workloads.
--
Michal Hocko
SUSE Labs