Re: [PATCH 2/2] mm: oom: deduplicate victim selection code for memcg and global oom
From: Vladimir Davydov
Date: Wed Jun 08 2016 - 10:07:31 EST
On Wed, Jun 08, 2016 at 10:33:34AM +0200, Michal Hocko wrote:
> On Fri 27-05-16 17:17:42, Vladimir Davydov wrote:
> [...]
> > @@ -970,26 +1028,25 @@ bool out_of_memory(struct oom_control *oc)
> > !oom_unkillable_task(current, NULL, oc->nodemask) &&
> > current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
> > get_task_struct(current);
> > - oom_kill_process(oc, current, 0, totalpages,
> > - "Out of memory (oom_kill_allocating_task)");
> > + oom_kill_process(oc, current, 0, totalpages);
> > return true;
> > }
>
> Do we really want to introduce sysctl_oom_kill_allocating_task to memcg
> as well?
Not sure, but why not? We take into account dump_tasks and panic_on_oom
on memcg oom so why should we treat this sysctl differently?
> The heuristic is quite dubious even for the global context IMHO
> because it leads to a very random behavior.
>
> > p = select_bad_process(oc, &points, totalpages);
> > /* Found nothing?!?! Either we hang forever, or we panic. */
> > - if (!p && !is_sysrq_oom(oc)) {
> > + if (!p && !is_sysrq_oom(oc) && !oc->memcg) {
> > dump_header(oc, NULL);
> > panic("Out of memory and no killable processes...\n");
> > }
> > if (p && p != (void *)-1UL) {
> > - oom_kill_process(oc, p, points, totalpages, "Out of memory");
> > + oom_kill_process(oc, p, points, totalpages);
> > /*
> > * Give the killed process a good chance to exit before trying
> > * to allocate memory again.
> > */
> > schedule_timeout_killable(1);
> > }
> > - return true;
> > + return !!p;
> > }
>
> Now if you look at out_of_memory() the only shared "heuristic" with the
> memcg part is the bypass for the exiting tasks.
bypass exiting task (task_will_free_mem)
check for panic (check_panic_on_oom)
oom badness evaluation (oom_scan_process_thread or oom_evaluate_task
after your patch)
points calculation + kill (oom_kill_process)
And if you need to modify any of these function calls or add yet another
check, you have to do it twice. Ugly.
> Plus both need the oom_lock.
I believe locking could be unified for global/memcg oom cases too.
> You have to special case oom notifiers, panic on no victim handling and
> I guess the oom_kill_allocating task is not intentional either. So I
> am not really sure this is an improvement. I even hate how we conflate
> sysrq vs. regular global oom context together but my cleanup for that
> has failed in the past.
>
> The victim selection code can be reduced because it is basically
> shared between the two, only the iterator differs. But I guess that
> can be eliminated by a simple helper.
IMHO exporting a bunch of very oom-specific helpers (like those I
enumerated above), partially revealing oom implementation, instead of
well defined memcg helpers that could be reused anywhere else looks
ugly. It's like having shrink_zone implementation both in vmscan.c and
memcontrol.c with shrink_slab, shrink_lruvec, etc. exported, because we
need to iterate over cgroups there.