Re: [syzbot] [mm?] kernel BUG in const_folio_flags

From: David Hildenbrand
Date: Thu Mar 21 2024 - 05:58:15 EST


On 21.03.24 10:49, Muchun Song wrote:


On Mar 21, 2024, at 12:04, syzbot <syzbot+3b9148f91b7869120e81@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: 78c3925c048c Merge tag 'soc-late-6.9' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1267d879180000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3c2635ded15fbc9
dashboard link: https://syzkaller.appspot.com/bug?extid=3b9148f91b7869120e81
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-78c3925c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/cf2bceeccde3/vmlinux-78c3925c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/fc938dfaea6d/bzImage-78c3925c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3b9148f91b7869120e81@xxxxxxxxxxxxxxxxxxxxxxxxx

veth_newlink+0x627/0xa10 drivers/net/veth.c:1895
rtnl_newlink_create net/core/rtnetlink.c:3494 [inline]
__rtnl_newlink+0x119c/0x1960 net/core/rtnetlink.c:3714
rtnl_newlink+0x67/0xa0 net/core/rtnetlink.c:3727
rtnetlink_rcv_msg+0x3c7/0xe60 net/core/rtnetlink.c:6595
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:315!

There are some more page dumping information from console:

[ 61.367144][ T42] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff888028132880 pfn:0x28130
[ 61.371430][ T42] flags: 0xfff80000000000(node=0|zone=1|lastcpupid=0xfff)
[ 61.374455][ T42] page_type: 0xffffffff()
[ 61.376096][ T42] raw: 00fff80000000000 ffff888015ecd540 dead000000000100 0000000000000000
[ 61.379994][ T42] raw: ffff888028132880 0000000000190000 00000000ffffffff 0000000000000000

Alright, the page is freed (with a refcount of 0).

invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 42 Comm: kcompactd0 Not tainted 6.8.0-syzkaller-11725-g78c3925c048c #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:const_folio_flags+0x1bd/0x1f0 include/linux/page-flags.h:315

The RIP is in const_folio_flags() (called from folio_test_hugetlb()):

VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);

It is reasonable to WARN because the page is freed (PG_head is not set
in this case).

The comments from folio_test_hugetlb() says "Caller should have a
reference on the folio", so the caller of PageHuge() should grab
a refcount before calling folio_test_hugetlb() since commit
9c5ccf2db04b. But it does not mean that the @page must be a HugeTLB page
even if PageHuge(@page) returns true when the user does not hold
a extra refcount on the @page. Seems the WARN could be acceptable, so
should we remove this WARN? I am not sure. Cc more experts.

Isn't this the problem Willy is fixing with the upcoing folio_test_hugetlb() changes?

We cannot always grab a folio reference on hugetlb folios: free hugetlb folios have a refcount of 0.

--
Cheers,

David / dhildenb