CGroup-v2: Memory: current impossibly high for empty cgroup

From: Bruno PrÃmont
Date: Thu Feb 15 2018 - 03:49:04 EST


With 4.15.2 kernel I'm hitting a state where a given leaf v2 cgroup is
considering itself as permanently over-limit and OOM-kill any process
I try to move into it (it's currently empty!)


I can't hand out a simple reproducer right now, but it seems during
accounting the counter went "negative".

CGroupv2 is mounted on /sys/fs/cgroup/hosting/ and the leaf-cgroup is
/sys/fs/cgroup/hosting/websrv/:
cgroup.controllers:cpu io memory pids
cgroup.events:populated 0
cgroup.max.depth:max
cgroup.max.descendants:max
cgroup.stat:nr_descendants 0
cgroup.stat:nr_dying_descendants 0
cgroup.type:domain
cpu.max:max 100000
cpu.stat:usage_usec 48830679
cpu.stat:user_usec 25448590
cpu.stat:system_usec 23382088
cpu.stat:nr_periods 0
cpu.stat:nr_throttled 0
cpu.stat:throttled_usec 0
cpu.weight:100
cpu.weight.nice:0
io.bfq.weight:100
io.stat:8:0 rbytes=143360 wbytes=32768 rios=24 wios=7
io.stat:7:0 rbytes=51687424 wbytes=0 rios=12619 wios=0
io.weight:default 100
memory.current:18446744073694965760
memory.events:low 0
memory.events:high 0
memory.events:max 114
memory.events:oom 15
memory.events:oom_kill 8
memory.high:8589934592
memory.low:4294967296
memory.max:17179869184
memory.stat:anon 0
memory.stat:file 0
memory.stat:kernel_stack 0
memory.stat:slab 593920
memory.stat:sock 0
memory.stat:shmem 0
memory.stat:file_mapped 0
memory.stat:file_dirty 0
memory.stat:file_writeback 0
memory.stat:inactive_anon 0
memory.stat:active_anon 0
memory.stat:inactive_file 0
memory.stat:active_file 0
memory.stat:unevictable 0
memory.stat:slab_reclaimable 204800
memory.stat:slab_unreclaimable 389120
memory.stat:pgfault 80983
memory.stat:pgmajfault 259
memory.stat:pgrefill 115
memory.stat:pgscan 384
memory.stat:pgsteal 263
memory.stat:pgactivate 100
memory.stat:pgdeactivate 115
memory.stat:pglazyfree 0
memory.stat:pglazyfreed 0
memory.stat:workingset_refault 0
memory.stat:workingset_activate 0
memory.stat:workingset_nodereclaim 3
pids.current:0
pids.events:max 0
pids.max:max

cgroup.procs is empty.


Possibly interesting content in kernel log:
[580391.746367] WARNING: CPU: 1 PID: 24354 at /data/kernel/linux-4.15/mm/page_counter.c:27 page_counter_cancel+0x10/0x20
[580391.746498] Modules linked in: floppy
[580391.746551] CPU: 1 PID: 24354 Comm: image-starter Not tainted 4.15.2-x86_64-vmware #2
[580391.746612] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015
[580391.746716] RIP: 0010:page_counter_cancel+0x10/0x20
[580391.746767] RSP: 0018:ffffad4f8475fb08 EFLAGS: 00010293
[580391.746819] RAX: 0000000000000005 RBX: ffffad4f8475fb38 RCX: 000000000000000e
[580391.746877] RDX: ffffa058b6a51148 RSI: 000000000000000e RDI: ffffa058b6a51148
[580391.746935] RBP: 000000000000000e R08: 0000000000022b30 R09: 0000000000000c4c
[580391.746993] R10: 0000000000000004 R11: ffffffffffffffff R12: ffffad4f8475fb38
[580391.747051] R13: ffffffffbbc68df8 R14: ffffad4f8475fd30 R15: 0000000000000001
[580391.747117] FS: 00007fd5e3b84740(0000) GS:ffffa058bdd00000(0000) knlGS:0000000000000000
[580391.747178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[580391.747233] CR2: 00000000016fa3e8 CR3: 0000000005c0a001 CR4: 00000000000606a0
[580391.747344] Call Trace:
[580391.747400] page_counter_uncharge+0x16/0x20
[580391.747450] uncharge_batch+0x2e/0x150
[580391.747503] mem_cgroup_uncharge_list+0x54/0x60
[580391.747555] release_pages+0x2f6/0x330
[580391.747609] __pagevec_release+0x25/0x30
[580391.747659] truncate_inode_pages_range+0x251/0x6e0
[580391.747712] ? write_cache_pages+0x311/0x350
[580391.747763] __blkdev_put+0x6f/0x1e0
[580391.747815] deactivate_locked_super+0x2a/0x60
[580391.747870] cleanup_mnt+0x40/0x60
[580391.747921] task_work_run+0x7b/0xa0
[580391.747975] do_exit+0x38d/0x960
[580391.749543] do_group_exit+0x99/0xa0
[580391.750525] SyS_exit_group+0xb/0x10
[580391.751457] do_syscall_64+0x74/0x120
[580391.752427] entry_SYSCALL_64_after_hwframe+0x21/0x86
[580391.753370] RIP: 0033:0x7fd5e3463529
[580391.754290] RSP: 002b:00007ffdaad73478 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[580391.755205] RAX: ffffffffffffffda RBX: 000000002d220011 RCX: 00007fd5e3463529
[580391.756119] RDX: 0000000000000000 RSI: 00007ffdaad732dc RDI: 0000000000000000
[580391.758421] RBP: 0000000000000000 R08: 000000000000003c R09: 00000000000000e7
[580391.759590] R10: ffffffffffffff80 R11: 0000000000000206 R12: 00007ffdaad72480
[580391.760499] R13: 0000000000001000 R14: 00000000016f8080 R15: 0000000000000000
[580391.761366] Code: eb 07 e8 f4 26 fb ff eb d0 48 c7 c7 00 40 c3 bb e8 46 50 44 00 89 d8 5b 5d c3 90 48 89 f0 48 f7 d8 f0 48 0f c1 07 48 39 f0 79 02 <0f> ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 eb 19 48 89 f0
[580391.763136] ---[ end trace 60c773f283acdb6f ]---
[580393.195186] image-starter invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
[580393.199401] image-starter cpuset=/ mems_allowed=0
[580393.201491] CPU: 1 PID: 28447 Comm: image-starter Tainted: G W 4.15.2-x86_64-vmware #2
[580393.202899] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1508272355 08/27/2015
[580393.202901] Call Trace:
[580393.202918] dump_stack+0x5c/0x85
[580393.202927] dump_header+0x5a/0x233
[580393.202933] oom_kill_process+0x91/0x3e0
[580393.202936] out_of_memory+0x221/0x240
[580393.202940] mem_cgroup_out_of_memory+0x36/0x50
[580393.202942] mem_cgroup_oom_synchronize+0x1ff/0x300
[580393.202945] ? __mem_cgroup_insert_exceeded+0x90/0x90
[580393.202947] pagefault_out_of_memory+0x1f/0x4d
[580393.202951] __do_page_fault+0x323/0x390
[580393.202957] ? page_fault+0x36/0x60
[580393.202959] page_fault+0x4c/0x60
[580393.202964] RIP: 0033:0x7f9edd31c919
[580393.202966] RSP: 002b:00007ffd818f4290 EFLAGS: 00010246
[580393.202968] Task in /websrv killed as a result of limit of /websrv
[580393.202972] memory: usage 18446744073709537436kB, limit 16777216kB, failcnt 44
[580393.202973] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[580393.202975] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[580393.202976] Memory cgroup stats for /websrv: cache:16KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:16KB inactive_anon:0KB active_anon:0KB inactive_file:8KB active_file:8KB unevictable:0KB
[580393.202983] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[580393.203001] [28447] 0 28447 2143 31 57344 0 0 image-starter
[580393.203004] [28448] 0 28448 2143 34 57344 0 0 image-starter
[580393.203006] [28449] 0 28449 2143 34 57344 0 0 image-starter
[580393.203008] [28450] 0 28450 2143 34 57344 0 0 image-starter
[580393.203010] [28451] 0 28451 2143 34 57344 0 0 image-starter
[580393.203012] Memory cgroup out of memory: Kill process 28449 (image-starter) score 0 or sacrifice child
[580393.203026] Killed process 28449 (image-starter) total-vm:8572kB, anon-rss:136kB, file-rss:0kB, shmem-rss:0kB
... (other processes being OOM-killed)


image-starter is a small kind of container managing tool that makes use of
namespaces and assigns namespaced processes to corresponding cgroups (pertinent
cgroup directories are bind-mounted into the mount namespace).


The probable memory-counter-underflow seems to have happened either on stop of
container (likely) or on new start of it.

Possibly of interest, some mmaps inside cgroup did fail during previous run of
container due to RLIMIT_DATA.


As this is a test system I can happily perform debugging and experimentation on it
(I'm currently working at moving from v1 cgroups to v2 cgroups).

Cheers,
Bruno