Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems
From: David Rientjes
Date: Fri Mar 13 2020 - 18:01:21 EST
On Fri, 13 Mar 2020, Tetsuo Handa wrote:
> > On an UP kernel with swap disabled, you limit a memcg to 100MB and start
> > three processes that each fault 40MB attached to it. Same reproducer as
> > the "mm, oom: make a last minute check to prevent unnecessary memcg oom
> > kills" patch except in that case there are two cores.
> >
>
> I'm not a heavy memcg user. Please provide steps for reproducing your problem
> in a "copy and pastable" way (e.g. bash script, C program).
>
Use Documentation/admin-guide/cgroup-v1/memory.rst or
Documentation/admin-guide/cgroup-v2.rst to setup a memcg depending on
which cgroup version you use, limit it to 100MB, and attach your process
to it.
Run three programs that fault 40MB. To do that, you need to use mmap:
(void)mmap(NULL, 40 << 20, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_POPULATE, 0, 0);
Have it stall after populating the memory:
for (;;);
> > > @@ -1576,6 +1576,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > > */
> > > ret = should_force_charge() || out_of_memory(&oc);
> > > mutex_unlock(&oom_lock);
> > > + schedule_timeout_killable(1);
> > > return ret;
> > > }
> > >
> >
> > If current was process chosen for oom kill, this would actually induce the
> > problem, not fix it.
> >
>
> Why? Memcg OOM path allows using forced charge path if should_force_charge() == true.
>
> Since your lockup report
>
> Call Trace:
> shrink_node+0x40d/0x7d0
> do_try_to_free_pages+0x13f/0x470
> try_to_free_mem_cgroup_pages+0x16d/0x230
> try_charge+0x247/0xac0
> mem_cgroup_try_charge+0x10a/0x220
> mem_cgroup_try_charge_delay+0x1e/0x40
> handle_mm_fault+0xdf2/0x15f0
> do_user_addr_fault+0x21f/0x420
> page_fault+0x2f/0x40
>
> says that allocating thread was calling try_to_free_mem_cgroup_pages() from try_charge(),
> allocating thread must be able to reach mem_cgroup_out_of_memory() from mem_cgroup_oom()
> from try_charge(). And actually
>
> Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB oom_score_adj:0
>
> says that allocating thread did reach mem_cgroup_out_of_memory(). Then, allocating thread
> must be able to sleep at mem_cgroup_out_of_memory() if schedule_timeout_killable(1) is
> mem_cgroup_out_of_memory().
>
> Also, if current process was chosen for OOM-kill, current process will be able to leave
> try_charge() due to should_force_charge() == true, won't it?
>
> Thus, how can "this would actually induce the problem, not fix it." happen?
The entire issue is that the victim never gets a chance to run because the
allocator doesn't give it a chance to run on an UP system. Your patch is
broken because if the victim is current, you've lost your golden
opportunity to actually exit and ceded control to the allocator that will
now starve the victim.