Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

From: Andrew Morton
Date: Tue Mar 10 2020 - 20:18:38 EST

Next message: Masami Hiramatsu: "Re: Instrumentation and RCU"
Previous message: Eric W. Biederman: "Re: [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex"
In reply to: Michal Hocko: "Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems"
Next in thread: David Rientjes: "Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, 10 Mar 2020 14:39:48 -0700 (PDT) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> When a process is oom killed as a result of memcg limits and the victim
> is waiting to exit, nothing ends up actually yielding the processor back
> to the victim on UP systems with preemption disabled. Instead, the
> charging process simply loops in memcg reclaim and eventually soft
> lockups.
>
> Memory cgroup out of memory: Killed process 808 (repro) total-vm:41944kB, anon-rss:35344kB, file-rss:504kB, shmem-rss:0kB, UID:0 pgtables:108kB oom_score_adj:0
> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [repro:806]
> CPU: 0 PID: 806 Comm: repro Not tainted 5.6.0-rc5+ #136
> RIP: 0010:shrink_lruvec+0x4e9/0xa40
> ...
> Call Trace:
> shrink_node+0x40d/0x7d0
> do_try_to_free_pages+0x13f/0x470
> try_to_free_mem_cgroup_pages+0x16d/0x230
> try_charge+0x247/0xac0
> mem_cgroup_try_charge+0x10a/0x220
> mem_cgroup_try_charge_delay+0x1e/0x40
> handle_mm_fault+0xdf2/0x15f0
> do_user_addr_fault+0x21f/0x420
> page_fault+0x2f/0x40
>
> Make sure that something ends up actually yielding the processor back to
> the victim to allow for memory freeing. Most appropriate place appears to
> be shrink_node_memcgs() where the iteration of all decendant memcgs could
> be particularly lengthy.
>

That's a bit sad.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2637,6 +2637,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
> unsigned long reclaimed;
> unsigned long scanned;
>
> + cond_resched();
> +
> switch (mem_cgroup_protected(target_memcg, memcg)) {
> case MEMCG_PROT_MIN:
> /*

Obviously better, but this will still spin wheels until this tasks's
timeslice expires, and we might want to do something to help ensure
that the victim runs next (or soon)?

(And why is shrink_node_memcgs compiled in when CONFIG_MEMCG=n?)

Next message: Masami Hiramatsu: "Re: Instrumentation and RCU"
Previous message: Eric W. Biederman: "Re: [PATCH v2 5/5] exec: Add a exec_update_mutex to replace cred_guard_mutex"
In reply to: Michal Hocko: "Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems"
Next in thread: David Rientjes: "Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]