Re: BUG: Bad page state in process - page dumped because: page still charged to cgroup

From: Roman Gushchin
Date: Thu Jul 02 2020 - 11:59:54 EST


On Thu, Jul 02, 2020 at 09:25:08PM +0530, Naresh Kamboju wrote:
> On Thu, 2 Jul 2020 at 21:19, Roman Gushchin <guro@xxxxxx> wrote:
> >
> > On Thu, Jul 02, 2020 at 12:22:03PM +0530, Naresh Kamboju wrote:
> > > On Thu, 2 Jul 2020 at 00:16, Roman Gushchin <guro@xxxxxx> wrote:
> > > >
> > > > On Wed, Jul 01, 2020 at 10:29:04AM +0200, Michal Hocko wrote:
> > > > > Smells like a different observable problem with the same/similar culprit
> > > > > as http://lkml.kernel.org/r/CA+G9fYtrgF_EZHi0vi+HyWiXT5LGggDhVXtNspc=OzzFhL=xRQ@xxxxxxxxxxxxxx
> > > > >
> > > > > On Wed 01-07-20 13:48:57, Naresh Kamboju wrote:
> > > > > > While running LTP mm test suite on x86_64 device the BUG: Bad page
> > > > > > state in process
> > > > > > noticed on linux-next 20200630 tag.
> > > > > >
> > > > > > Steps to reproduce:
> > > > > > - boot linux-next 20200630 kernel on x86_64 device
> > > > > > - cd /opt/ltp
> > > > > > - ./runltp -f mm
> > > > > >
> > > > > > metadata:
> > > > > > git branch: master
> > > > > > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> > > > > > git commit: f2b92b14533e646e434523abdbafddb727c23898
> > > > > > git describe: next-20200630
> > > > > > kernel-config:
> > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.tuxbuild.com_j60yrp7CUpq3LCmqMB8Wdg_kernel.config&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=h_KJ0e7abuh0BK2eDlDmWnAxqHPccpqchPgBS-oJcVE&s=qofg2XRToTeHvi8vSdOvDPtKpJsUqf3IWfqwieZqITg&e=
> > > > > >
> > > > > > Test crash dump:
> > > > > > [ 803.905169] Node 0 Normal: 2608*4kB (UMEH) 1380*8kB (UMEH) 64*16kB
> > > > > > (MEH) 28*32kB (MEH) 13*64kB (UMEH) 164*128kB (UMEH) 39*256kB (UE)
> > > > > > 1*512kB (M) 1*1024kB (M) 1*2048kB (M) 1*4096kB (M) = 62880kB
> > > > > > [ 803.922375] Node 0 hugepages_total=0 hugepages_free=0
> > > > > > hugepages_surp=0 hugepages_size=2048kB
> > > > > > [ 803.930806] 2418 total pagecache pages
> > > > > > [ 803.934559] 0 pages in swap cache
> > > > > > [ 803.937878] Swap cache stats: add 0, delete 0, find 0/0
> > > > > > [ 803.943108] Free swap = 0kB
> > > > > > [ 803.945997] Total swap = 0kB
> > > > > > [ 803.948885] 4181245 pages RAM
> > > > > > [ 803.951857] 0 pages HighMem/MovableOnly
> > > > > > [ 803.955695] 626062 pages reserved
> > > > > > [ 803.959016] Tasks state (memory values in pages):
> > > > > > [ 803.963722] [ pid ] uid tgid total_vm rss pgtables_bytes
> > > > > > swapents oom_score_adj name
> > > > > > [ 803.972336] [ 332] 0 332 8529 507 106496
> > > > > > 0 0 systemd-journal
> > > > > > [ 803.981387] [ 349] 0 349 10730 508 118784
> > > > > > 0 -1000 systemd-udevd
> > > > > > [ 803.990262] [ 371] 993 371 8666 108 118784
> > > > > > 0 0 systemd-network
> > > > > > [ 803.999306] [ 379] 992 379 9529 99 110592
> > > > > > 0 0 systemd-resolve
> > > > > > [ 804.008347] [ 388] 0 388 2112 19 61440
> > > > > > 0 0 syslogd
> > > > > > [ 804.016709] [ 389] 995 389 9308 108 122880
> > > > > > 0 0 avahi-daemon
> > > > > > [ 804.025517] [ 391] 0 391 1075 21 57344
> > > > > > 0 0 acpid
> > > > > > [ 804.033695] [ 394] 995 394 9277 68 114688
> > > > > > 0 0 avahi-daemon
> > > > > > [ 804.042476] [ 396] 996 396 7241 154 102400
> > > > > > 0 -900 dbus-daemon
> > > > > > [ 804.051170] [ 397] 0 397 2313 72 65536
> > > > > > 0 0 crond
> > > > > > [ 804.059349] [ 399] 0 399 34025 161 167936
> > > > > > 0 0 thermald
> > > > > > [ 804.067783] [ 400] 0 400 8615 115 110592
> > > > > > 0 0 systemd-logind
> > > > > > [ 804.076734] [ 401] 0 401 2112 32 57344
> > > > > > 0 0 klogd
> > > > > > [ 804.084907] [ 449] 65534 449 3245 39 69632
> > > > > > 0 0 dnsmasq
> > > > > > [ 804.093254] [ 450] 0 450 3187 33 73728
> > > > > > 0 0 agetty
> > > > > > [ 804.101541] [ 452] 0 452 3187 33 73728
> > > > > > 0 0 agetty
> > > > > > [ 804.109826] [ 453] 0 453 14707 107 159744
> > > > > > 0 0 login
> > > > > > [ 804.118007] [ 463] 0 463 9532 163 122880
> > > > > > 0 0 systemd
> > > > > > [ 804.126362] [ 464] 0 464 16132 424 172032
> > > > > > 0 0 (sd-pam)
> > > > > > [ 804.134803] [ 468] 0 468 4538 105 81920
> > > > > > 0 0 sh
> > > > > > [ 804.142741] [ 472] 0 472 11102 83 131072
> > > > > > 0 0 su
> > > > > > [ 804.150680] [ 473] 0 473 4538 99 81920
> > > > > > 0 0 sh
> > > > > > [ 804.158637] [ 519] 0 519 2396 57 61440
> > > > > > 0 0 lava-test-runne
> > > > > > [ 804.167700] [ 1220] 0 1220 2396 52 61440
> > > > > > 0 0 lava-test-shell
> > > > > > [ 804.176738] [ 1221] 0 1221 2396 55 61440
> > > > > > 0 0 sh
> > > > > > [ 804.184680] [ 1223] 0 1223 2462 135 61440
> > > > > > 0 0 ltp.sh
> > > > > > [ 804.192946] [ 1242] 0 1242 2462 134 61440
> > > > > > 0 0 ltp.sh
> > > > > > [ 804.201207] [ 1243] 0 1243 2462 134 61440
> > > > > > 0 0 ltp.sh
> > > > > > [ 804.209475] [ 1244] 0 1244 2462 134 61440
> > > > > > 0 0 ltp.sh
> > > > > > [ 804.217742] [ 1245] 0 1245 2561 229 65536
> > > > > > 0 0 runltp
> > > > > > [ 804.226010] [ 1246] 0 1246 1072 15 53248
> > > > > > 0 0 tee
> > > > > > [ 804.234012] [ 1313] 0 1313 1070 29 53248
> > > > > > 0 0 ltp-pan
> > > > > > [ 804.242374] [ 3216] 0 3216 1613 20 53248
> > > > > > 0 0 oom01
> > > > > > [ 804.250554] [ 3217] 0 3217 1646 31 57344
> > > > > > 0 0 oom01
> > > > > > [ 804.258728] [ 3245] 0 3245 81271 469 266240
> > > > > > 0 0 NetworkManager
> > > > > > [ 804.267688] [ 3249] 0 3249 6422 54 98304
> > > > > > 0 0 systemd-hostnam
> > > > > > [ 804.276734] [ 3250] 0 3250 52976 178 172032
> > > > > > 0 0 nm-dispatcher
> > > > > > [ 804.285603] [ 3254] 998 3254 131113 828 245760
> > > > > > 0 0 polkitd
> > > > > > [ 804.293956] [ 3261] 0 3261 4726385 3349389 26939392
> > > > > > 0 0 oom01
> > > > > > [ 804.302129] [ 3265] 0 3265 3187 33 73728
> > > > > > 0 0 agetty
> > > > > > [ 804.310397] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=3261,uid=0
> > > > > > [ 804.322751] Out of memory: Killed process 3261 (oom01)
> > > > > > total-vm:18905540kB, anon-rss:13397556kB, file-rss:0kB, shmem-rss:0kB,
> > > > > > UID:0 pgtables:26308kB oom_score_adj:0
> > > > > > [ 806.652952] oom_reaper: reaped process 3261 (oom01), now
> > > > > > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> > > > > > [ 807.579373] BUG: Bad page state in process kworker/u8:12 pfn:374308
> > > > > > [ 807.579521] BUG: Bad page state in process kworker/u8:13 pfn:4182a4
> > > > > > [ 807.585734] page:ffffea000dd0c200 refcount:0 mapcount:0
> > > > > > mapping:0000000000000000 index:0xffff88837430e000
> > > > > > head:ffffea000dd0c200 order:3 compound_mapcount:0 compound_pincount:0
> > > > > > [ 807.585736] flags: 0x200000000010000(head)
> > > > > > [ 807.585740] raw: 0200000000010000 ffffea000dce6e00 0000000200000002
> > > > > > 0000000000000000
> > > > > > [ 807.592099] page:ffffea001060a900 refcount:0 mapcount:0
> > > > > > mapping:0000000000000000 index:0xffff8884182a5e00
> > > > > > head:ffffea001060a900 order:1 compound_mapcount:0
> > > > > > [ 807.607719] raw: ffff88837430e000 0000000000040000 00000000ffffffff
> > > > > > ffff8883bda6cac1
> > > > > > [ 807.607720] page dumped because: page still charged to cgroup
> > > > > > [ 807.607720] page->mem_cgroup:ffff8883bda6cac1
> > > > > > [ 807.607721] Modules linked in: x86_pkg_temp_thermal
> > > > > > [ 807.607725] CPU: 0 PID: 3242 Comm: kworker/u8:12 Not tainted
> > > > > > 5.8.0-rc3-next-20200630 #1
> > > > > > [ 807.607727] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
> > > > > > 2.0b 07/27/2017
> > > > > > [ 807.607731] Workqueue: rpciod rpc_async_schedule
> > > > > > [ 807.611836] flags: 0x200000000010000(head)
> > > > > > [ 807.619563] Call Trace:
> > > > > > [ 807.619567] dump_stack+0x84/0xba
> > > > > > [ 807.619569] bad_page.cold+0x7b/0xac
> > > > > > [ 807.619573] __free_pages_ok+0x95b/0xab0
> > > > > > [ 807.633461] raw: 0200000000010000 dead000000000100 dead000000000122
> > > > > > 0000000000000000
> > > > > > [ 807.641189] __free_pages+0x42/0x50
> > > > > > [ 807.641191] __free_slab+0xcd/0x1f0
> > > >
> > > > Hm, interesting, it means that page->obj_cgroups is still set.
> > > > But before __free_pages() __free_slab() always calls uncharge_slab_page(),
> > > > which sets page->obj_cgroups to NULL except when !memcg_kmem_enabled().
> > > >
> > > > So it makes me think that somehow memcg_kmem_enabled() became false
> > > > after being true, which can cause refcounting problems as well.
> > > >
> > > > Naresh, can you, please, check if the following patch solves problems?
> > > > And thank you for reporting the problem!
> > > >
> > >
> > > Thanks for providing a quick fix patch.
> > >
> > > >
> > > > From c97afecd32c0db5e024be9ba72f43d22974f5bcd Mon Sep 17 00:00:00 2001
> > > > From: Roman Gushchin <guro@xxxxxx>
> > > > Date: Wed, 1 Jul 2020 11:05:32 -0700
> > > > Subject: [PATCH] mm: kmem: make memcg_kmem_enabled() irreversible
> > > >
> > > > Historically the kernel memory accounting was an opt-in feature, which
> > > > could be enabled for individual cgroups. But now it's not true, and
> > > > it's on by default both on cgroup v1 and cgroup v2. And as long as a
> > > > user has at least one non-root memory cgroup, the kernel memory
> > > > accounting is on. So in most setups it's either always on (if memory
> > > > cgroups are in use and kmem accounting is not disabled), either always
> > > > off (otherwise).
> > > >
> > > > memcg_kmem_enabled() is used in many places to guard the kernel memory
> > > > accounting code. If memcg_kmem_enabled() can reverse from returning
> > > > true to returning false (as now), we can't rely on it on release paths
> > > > and have to check if it was on before.
> > > >
> > > > If we'll make memcg_kmem_enabled() irreversible (always returning true
> > > > after returning it for the first time), it'll make the general logic
> > > > more simple and robust. It also will allow to guard some checks which
> > > > otherwise would stay unguarded.
> > > >
> > > > Signed-off-by: Roman Gushchin <guro@xxxxxx>
> > > Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> > > Tested-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> > >
> > > The below patch applied on top of linux-next 20200701and tested and
> > > the reported bug got fixed on arm64, arm, x86_64 and i386.
> >
> > Hello, Naresh!
> >
> > Thank you very much for testing the patch!
> >
> > Btw, did it also fix the problem reported here:
> > https://lore.kernel.org/linux-mm/20200701081951.GL2369@xxxxxxxxxxxxxx/T/#t ?
>
> Yes it got fixed the reported problem on the above e-mail thread.

Thank you for the confirmation!