[RFC PATCH 0/2] mm: fix OOMs for binding workloads to movable zone only node

From: Feng Tang
Date: Wed Nov 04 2020 - 01:19:54 EST


Hi,

This patchset tries to report a problem and get suggestion/review
for the RFC fix patches.

We recently got a OOM report, that when user try to bind a docker(container)
instance to a memory node which only has movable zones, and OOM killing
still can't solve the page allocation failure.

The callstack was:

[ 1387.877565] runc:[2:INIT] invoked oom-killer: gfp_mask=0x500cc2(GFP_HIGHUSER|__GFP_ACCOUNT), order=0, oom_score_adj=0
[ 1387.877568] CPU: 8 PID: 8291 Comm: runc:[2:INIT] Tainted: G W I E 5.8.2-0.g71b519a-default #1 openSUSE Tumbleweed (unreleased)
[ 1387.877569] Hardware name: Dell Inc. PowerEdge R640/0PHYDR, BIOS 2.6.4 04/09/2020
[ 1387.877570] Call Trace:
[ 1387.877579] dump_stack+0x6b/0x88
[ 1387.877584] dump_header+0x4a/0x1e2
[ 1387.877586] oom_kill_process.cold+0xb/0x10
[ 1387.877588] out_of_memory.part.0+0xaf/0x230
[ 1387.877591] out_of_memory+0x3d/0x80
[ 1387.877595] __alloc_pages_slowpath.constprop.0+0x954/0xa20
[ 1387.877599] __alloc_pages_nodemask+0x2d3/0x300
[ 1387.877602] pipe_write+0x322/0x590
[ 1387.877607] new_sync_write+0x196/0x1b0
[ 1387.877609] vfs_write+0x1c3/0x1f0
[ 1387.877611] ksys_write+0xa7/0xe0
[ 1387.877617] do_syscall_64+0x52/0xd0
[ 1387.877621] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The meminfo log only shows the movable only node, which has plenty
of free memory. And in our reproducing with 1/2 patch, the normal
node (has DMA/DMA32/Normal) also has lot of free memory when OOM
happens.

If we hack to make this (GFP_HIGHUSER|__GFP_ACCOUNT) request get
a page, and following full docker run (like installing and running
'stress-ng' stress test) will see more allocation failures due to
different kinds of request(gfp_masks). And the 2/2 patch will detect
such cases that the allowed target nodes only have movable zones
and loose the binding check, otherwise it will trigger OOM while
the OOM won't do any help, as the problem is not lack of free memory.

Feng Tang (2):
mm, oom: dump meminfo for all memory nodes
mm, page_alloc: loose the node binding check to avoid helpless oom
killing

mm/oom_kill.c | 2 +-
mm/page_alloc.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)

--
2.7.4