Re: [syzbot] [virtualization?] upstream boot error: WARNING: refcount bug in __free_pages_ok
From: Stefan Hajnoczi
Date: Wed Mar 20 2024 - 07:30:24 EST
On Tue, Mar 19, 2024 at 03:51:18PM -0500, Mike Christie wrote:
> On 3/19/24 12:19 PM, Stefan Hajnoczi wrote:
> > On Tue, Mar 19, 2024 at 03:40:53AM -0400, Michael S. Tsirkin wrote:
> >> On Tue, Mar 19, 2024 at 12:32:26AM -0700, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot found the following issue on:
> >>>
> >>> HEAD commit: b3603fcb79b1 Merge tag 'dlm-6.9' of git://git.kernel.org/p..
> >>> git tree: upstream
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=10f04c81180000
> >>> kernel config: https://syzkaller.appspot.com/x/.config?x=fcb5bfbee0a42b54
> >>> dashboard link: https://syzkaller.appspot.com/bug?extid=70f57d8a3ae84934c003
> >>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >>>
> >>> Downloadable assets:
> >>> disk image: https://storage.googleapis.com/syzbot-assets/43969dffd4a6/disk-b3603fcb.raw.xz
> >>> vmlinux: https://storage.googleapis.com/syzbot-assets/ef48ab3b378b/vmlinux-b3603fcb.xz
> >>> kernel image: https://storage.googleapis.com/syzbot-assets/728f5ff2b6fe/bzImage-b3603fcb.xz
> >>>
> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>> Reported-by: syzbot+70f57d8a3ae84934c003@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>>
> >>> Key type pkcs7_test registered
> >>> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 239)
> >>> io scheduler mq-deadline registered
> >>> io scheduler kyber registered
> >>> io scheduler bfq registered
> >>> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> >>> ACPI: button: Power Button [PWRF]
> >>> input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input1
> >>> ACPI: button: Sleep Button [SLPF]
> >>> ioatdma: Intel(R) QuickData Technology Driver 5.00
> >>> ACPI: \_SB_.LNKC: Enabled at IRQ 11
> >>> virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
> >>> ACPI: \_SB_.LNKD: Enabled at IRQ 10
> >>> virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
> >>> ACPI: \_SB_.LNKB: Enabled at IRQ 10
> >>> virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
> >>> virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
> >>> N_HDLC line discipline registered with maxframe=4096
> >>> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> >>> 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> >>> 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
> >>> 00:05: ttyS2 at I/O 0x3e8 (irq = 6, base_baud = 115200) is a 16550A
> >>> 00:06: ttyS3 at I/O 0x2e8 (irq = 7, base_baud = 115200) is a 16550A
> >>> Non-volatile memory driver v1.3
> >>> Linux agpgart interface v0.103
> >>> ACPI: bus type drm_connector registered
> >>> [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> >>> [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> >>> Console: switching to colour frame buffer device 128x48
> >>> platform vkms: [drm] fb0: vkmsdrmfb frame buffer device
> >>> usbcore: registered new interface driver udl
> >>> brd: module loaded
> >>> loop: module loaded
> >>> zram: Added device: zram0
> >>> null_blk: disk nullb0 created
> >>> null_blk: module loaded
> >>> Guest personality initialized and is inactive
> >>> VMCI host device registered (name=vmci, major=10, minor=118)
> >>> Initialized host personality
> >>> usbcore: registered new interface driver rtsx_usb
> >>> usbcore: registered new interface driver viperboard
> >>> usbcore: registered new interface driver dln2
> >>> usbcore: registered new interface driver pn533_usb
> >>> nfcsim 0.2 initialized
> >>> usbcore: registered new interface driver port100
> >>> usbcore: registered new interface driver nfcmrvl
> >>> Loading iSCSI transport class v2.0-870.
> >>> virtio_scsi virtio0: 1/0/0 default/read/poll queues
> >>> ------------[ cut here ]------------
> >>> refcount_t: decrement hit 0; leaking memory.
> >>> WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> >>> Modules linked in:
> >>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.8.0-syzkaller-11567-gb3603fcb79b1 #0
> >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> >>> RIP: 0010:refcount_warn_saturate+0xfa/0x1d0 lib/refcount.c:31
> >>> Code: b2 00 00 00 e8 57 d4 f2 fc 5b 5d c3 cc cc cc cc e8 4b d4 f2 fc c6 05 0c f9 ef 0a 01 90 48 c7 c7 a0 5d 1e 8c e8 b7 75 b5 fc 90 <0f> 0b 90 90 eb d9 e8 2b d4 f2 fc c6 05 e9 f8 ef 0a 01 90 48 c7 c7
> >>> RSP: 0000:ffffc90000066e18 EFLAGS: 00010246
> >>> RAX: 76f86e452fcad900 RBX: ffff8880210d2aec RCX: ffff888016ac8000
> >>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >>> RBP: 0000000000000004 R08: ffffffff8157ffe2 R09: fffffbfff1c396e0
> >>> R10: dffffc0000000000 R11: fffffbfff1c396e0 R12: ffffea000502cdc0
> >>> R13: ffffea000502cdc8 R14: 1ffffd4000a059b9 R15: 0000000000000000
> >>> FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> CR2: ffff88823ffff000 CR3: 000000000e132000 CR4: 00000000003506f0
> >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>> Call Trace:
> >>> <TASK>
> >>> reset_page_owner include/linux/page_owner.h:25 [inline]
> >>> free_pages_prepare mm/page_alloc.c:1141 [inline]
> >>> __free_pages_ok+0xc54/0xd80 mm/page_alloc.c:1270
> >>> make_alloc_exact+0xa3/0xf0 mm/page_alloc.c:4829
> >>> vring_alloc_queue drivers/virtio/virtio_ring.c:319 [inline]
> >>> vring_alloc_queue_split+0x20a/0x600 drivers/virtio/virtio_ring.c:1108
> >>> vring_create_virtqueue_split+0xc6/0x310 drivers/virtio/virtio_ring.c:1158
> >>> vring_create_virtqueue+0xca/0x110 drivers/virtio/virtio_ring.c:2683
> >>> setup_vq+0xe9/0x2d0 drivers/virtio/virtio_pci_legacy.c:131
> >>> vp_setup_vq+0xbf/0x330 drivers/virtio/virtio_pci_common.c:189
> >>> vp_find_vqs_msix+0x8b2/0xc80 drivers/virtio/virtio_pci_common.c:331
> >>> vp_find_vqs+0x4c/0x4e0 drivers/virtio/virtio_pci_common.c:408
> >>> virtio_find_vqs include/linux/virtio_config.h:233 [inline]
> >>> virtscsi_init+0x8db/0xd00 drivers/scsi/virtio_scsi.c:887
> >>> virtscsi_probe+0x3ea/0xf60 drivers/scsi/virtio_scsi.c:945
> >>> virtio_dev_probe+0x991/0xaf0 drivers/virtio/virtio.c:311
> >>> really_probe+0x29e/0xc50 drivers/base/dd.c:658
> >>> __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
> >>> driver_probe_device+0x50/0x430 drivers/base/dd.c:830
> >>> __driver_attach+0x45f/0x710 drivers/base/dd.c:1216
> >>> bus_for_each_dev+0x239/0x2b0 drivers/base/bus.c:368
> >>> bus_add_driver+0x347/0x620 drivers/base/bus.c:673
> >>> driver_register+0x23a/0x320 drivers/base/driver.c:246
> >>> virtio_scsi_init+0x65/0xe0 drivers/scsi/virtio_scsi.c:1083
> >>> do_one_initcall+0x248/0x880 init/main.c:1238
> >>> do_initcall_level+0x157/0x210 init/main.c:1300
> >>> do_initcalls+0x3f/0x80 init/main.c:1316
> >>> kernel_init_freeable+0x435/0x5d0 init/main.c:1548
> >>> kernel_init+0x1d/0x2b0 init/main.c:1437
> >>> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
> >>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
> >>> </TASK>
> >>>
> >>
> >> I think I saw this already and also with virtio scsi. virtio
> >> core does not seem to be doing anything special here,
> >> Cc virtio scsi maintainers.
> >
> > The oldest commit that syzkaller found is a memory management pull
> > request:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=e5eb28f6d1afebed4bb7d740a797d0390bd3a357
> >
> > I can't reproduce the issue locally with QEMU 8.2.0 so I don't have a
> > way to bisect.
> >
> > I reviewed the virtio_scsi.c git log and there have been few changes
> > over the last several months. I couldn't spot an issue in this patch,
> > but the most likely virtio-scsi commit is:
> >
> > commit 95e7249691f082a5178d4d6f60fcdee91da458ab
> > Author: Mike Christie <michael.christie@xxxxxxxxxx>
> > Date: Wed Dec 13 23:26:49 2023 -0600
> >
> > scsi: virtio_scsi: Add mq_poll support
> >
> > Stefan
>
> I also tested the current kernel and didn't hit it.
>
> In this mail:
>
> https://lore.kernel.org/all/ZfKPf_pGxv-xtSPN@localhost.localdomain/
>
> from this thread:
>
> https://lore.kernel.org/all/37cb2e7c-97f1-4179-a715-84cc02096083@xxxxxxxxxxxxxxxxxxx/T/
>
> it looks like Oscar is saying he has a fix right?
Yes, here is Oscar's work-in-progress fix:
https://lore.kernel.org/all/20240319183212.17156-1-osalvador@xxxxxxx/
Commit 217b2119b9e2 ("mm,page_owner: implement the tracking of the
stacks count") introduced the issue and it was merged via commit
902861e34c40 ("Merge tag 'mm-stable-2024-03-13-20-04' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm").
To be 100% sure, I'll test the commit in question and its parent:
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4bedfb314bdd85c1662ecc46fa25b33b998f994d
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 217b2119b9e260609958db413876f211038f00ee
Stefan
Attachment:
signature.asc
Description: PGP signature