Re: Fwd: kernel bug when performing heavy IO operations

From: Matthew Wilcox
Date: Sat Aug 26 2023 - 23:47:06 EST


On Sun, Aug 27, 2023 at 10:20:51AM +0700, Bagas Sanjaya wrote:
> > When the IO load is heavy (compiling AOSP in my case), there's a chance to crash the kernel, the only way to recover is to perform a hard reset. Logs look like follows:
> >
> > 8月 25 13:52:23 arch-pc kernel: BUG: Bad page map in process tmux: client pte:8000000462500025 pmd:b99c98067
> > 8月 25 13:52:23 arch-pc kernel: page:00000000460fa108 refcount:4 mapcount:-256 mapping:00000000612a1864 index:0x16 pfn:0x462500
> > 8月 25 13:52:23 arch-pc kernel: memcg:ffff8a1056ed0000
> > 8月 25 13:52:23 arch-pc kernel: aops:btrfs_aops [btrfs] ino:9c4635 dentry name:"locale-archive"
> > 8月 25 13:52:23 arch-pc kernel: flags: 0x2ffff5800002056(referenced|uptodate|lru|workingset|private|node=0|zone=2|lastcpupid=0xffff)
> > 8月 25 13:52:23 arch-pc kernel: page_type: 0xfffffeff(offline)

This is interesting. PG_offline is set.

$ git grep SetPageOffline
arch/powerpc/platforms/powernv/memtrace.c: __SetPageOffline(pfn_to_page(pfn));
drivers/hv/hv_balloon.c: __SetPageOffline(pg);
drivers/hv/hv_balloon.c: __SetPageOffline(pg + j);
drivers/misc/vmw_balloon.c: __SetPageOffline(page + i);
drivers/virtio/virtio_mem.c: __SetPageOffline(page);
drivers/xen/balloon.c: __SetPageOffline(page);
include/linux/balloon_compaction.h: __SetPageOffline(page);
include/linux/balloon_compaction.h: __SetPageOffline(page);

But there's no indication that this kernel is running under a
hypervisor:

> > 8月 25 13:52:23 arch-pc kernel: Hardware name: JGINYUE X99-8D3/2.5G Server/X99-8D3/2.5G Server, BIOS 5.11 06/30/2022

So I'd agree with Artem, this looks like bad RAM.

> IMO, this looks like it is introduced by page cache (folio) feature.

... because the string "folio" appears in the crash report? Come on,
Bagas, you can do better than that.