Re: [syzbot] upstream boot error: can't ssh into the instance (15)

From: Aleksandr Nogikh
Date: Thu Oct 12 2023 - 12:49:35 EST


For the record: the problems were solved by switching to qemu v8.

#syz invalid

On Fri, Sep 29, 2023 at 3:36 PM Aleksandr Nogikh <nogikh@xxxxxxxxxx> wrote:
>
>
> On Fri, Sep 29, 2023 at 3:29 PM syzbot <syzbot+be9661ba81a9c1cf6b15@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 9ed22ae6be81 Merge tag 'spi-fix-v6.6-rc3' of git://git.ker..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14b37a7c680000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=d4bdf71ec9aec6cc
> > dashboard link: https://syzkaller.appspot.com/bug?extid=be9661ba81a9c1cf6b15
> > compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > userspace arch: arm64
> >
> > Downloadable assets:
> > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-9ed22ae6.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/2c3d5eea45bd/vmlinux-9ed22ae6.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/54444f361432/Image-9ed22ae6.gz.xz
>
> This appears on qemu-system-aarch64 with virt,virtualization=on,mte=on,graphics=on,usb=on.
>
> I've run it locally using the assets above and it seems there are actually two problems behind the report.
>
> 1) For some reason, v7.2 of qemu-system-aarch64 just hangs with "-smp 2" and prints no output.
>
> Interestingly, it all works fine on qemu v8.0.4, so I don't know if it's a qemu or a kernel problem.
> Qemu v8 is unfortunately still too new for many distributions (we use debian:bookworm on syzbot
> and v7.2 is the latest there).
>
> 2) If I set "-smp 1", it begins to boot, but quickly fails with several messages. First with
>
> [ 0.000000][ T0] ==================================================================
> [ 0.000000][ T0] BUG: KASAN: slab-out-of-bounds in __kasan_slab_alloc+0x7c/0xcc
> [ 0.000000][ T0] Read at addr fcff000002c01008 by task swapper/0
> [ 0.000000][ T0] Pointer tag: [fc], memory tag: [f5]
> [ 0.000000][ T0]
> [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
> [ 0.000000][ T0] Hardware name: linux,dummy-virt (DT)
> [ 0.000000][ T0] Call trace:
> [ 0.000000][ T0] dump_backtrace+0x94/0xec
> [ 0.000000][ T0] show_stack+0x18/0x24
> [ 0.000000][ T0] dump_stack_lvl+0x48/0x60
> [ 0.000000][ T0] print_report+0x108/0x618
> [ 0.000000][ T0] kasan_report+0x88/0xac
> [ 0.000000][ T0] __do_kernel_fault+0x17c/0x1e8
> [ 0.000000][ T0] do_tag_check_fault+0x78/0x8c
> [ 0.000000][ T0] do_mem_abort+0x44/0x94
> [ 0.000000][ T0] el1_abort+0x40/0x60
> [ 0.000000][ T0] el1h_64_sync_handler+0xd8/0xe4
> [ 0.000000][ T0] el1h_64_sync+0x64/0x68
> [ 0.000000][ T0] __kasan_slab_alloc+0x7c/0xcc
> [ 0.000000][ T0] kmem_cache_alloc+0x144/0x290
> [ 0.000000][ T0] bootstrap+0x2c/0x174
> [ 0.000000][ T0] kmem_cache_init+0x144/0x1c8
> [ 0.000000][ T0] mm_core_init+0x240/0x2d4
> [ 0.000000][ T0] start_kernel+0x220/0x5fc
> [ 0.000000][ T0] __primary_switched+0xb4/0xbc
> [ 0.000000][ T0]
> [ 0.000000][ T0] Allocated by task 0:
> [ 0.000000][ T0] kasan_save_stack+0x3c/0x64
> [ 0.000000][ T0] save_stack_info+0x38/0x118
> [ 0.000000][ T0] kasan_save_alloc_info+0x14/0x20
> [ 0.000000][ T0] __kasan_slab_alloc+0x94/0xcc
> [ 0.000000][ T0] kmem_cache_alloc+0x144/0x290
> [ 0.000000][ T0] bootstrap+0x2c/0x174
> [ 0.000000][ T0] kmem_cache_init+0x134/0x1c8
> [ 0.000000][ T0] mm_core_init+0x240/0x2d4
> [ 0.000000][ T0] start_kernel+0x220/0x5fc
> [ 0.000000][ T0] __primary_switched+0xb4/0xbc
> [ 0.000000][ T0]
> [ 0.000000][ T0] The buggy address belongs to the object at ffff000002c01000
> [ 0.000000][ T0] which belongs to the cache kmem_cache of size 208
> [ 0.000000][ T0] The buggy address is located 8 bytes inside of
> [ 0.000000][ T0] 208-byte region [ffff000002c01000, ffff000002c010d0)
> [ 0.000000][ T0]
> [ 0.000000][ T0] The buggy address belongs to the physical page:
> [ 0.000000][ T0] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x42c01
> [ 0.000000][ T0] flags: 0x1ffc00000000800(slab|node=0|zone=0|lastcpupid=0x7ff|kasantag=0x0)
> [ 0.000000][ T0] page_type: 0xffffffff()
> [ 0.000000][ T0] raw: 01ffc00000000800 fcff000002c01000 dead000000000100 dead000000000122
> [ 0.000000][ T0] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
> [ 0.000000][ T0] page dumped because: kasan: bad access detected
> [ 0.000000][ T0]
> [ 0.000000][ T0] Memory state around the buggy address:
> [ 0.000000][ T0] ffff000002c00e00: f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0
> [ 0.000000][ T0] ffff000002c00f00: f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0
> [ 0.000000][ T0] >ffff000002c01000: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
> [ 0.000000][ T0] ^
> [ 0.000000][ T0] ffff000002c01100: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
> [ 0.000000][ T0] ffff000002c01200: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
> [ 0.000000][ T0] ==================================================================
>
> And then with
>
> [ 8.765595][ T1] ------------[ cut here ]------------
> [ 8.766137][ T1] WARNING: CPU: 0 PID: 1 at drivers/gpu/drm/drm_managed.c:133 drmm_add_final_kfree+0x7c/0x98
> [ 8.767715][ T1] Modules linked in:
> [ 8.768946][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 6.6.0-rc3-syzkaller-00055-g9ed22ae6be81 #0
> [ 8.769970][ T1] Hardware name: linux,dummy-virt (DT)
> [ 8.770655][ T1] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 8.771383][ T1] pc : drmm_add_final_kfree+0x7c/0x98
> [ 8.771878][ T1] lr : drmm_add_final_kfree+0x30/0x98
> [ 8.772388][ T1] sp : ffff80008000bcd0
> [ 8.772772][ T1] x29: ffff80008000bcd0 x28: 0000000000000000 x27: ffff8000823c6068
> [ 8.773750][ T1] x26: ffff8000822c00b0 x25: ffff8000821eed90 x24: ffff800082299df0
> [ 8.774586][ T1] x23: ffff8000823c6078 x22: faff000003c53010 x21: 0000000000000000
> [ 8.775410][ T1] x20: f6ff000003850800 x19: f6ff000003850800 x18: ffffffffffffffff
> [ 8.776238][ T1] x17: ffff80008082a678 x16: ffff8000808dbae8 x15: ffff8000808db33c
> [ 8.777061][ T1] x14: ffff800080248568 x13: ffff800080015698 x12: ffff800081893acc
> [ 8.777884][ T1] x11: ffff8000822c110c x10: ffff8000800145b4 x9 : ffff8000802b8dcc
> [ 8.778747][ T1] x8 : ffff80008000bc90 x7 : 0000000000000000 x6 : 0000000000008000
> [ 8.779554][ T1] x5 : f1ff000003794f00 x4 : 0000000000000000 x3 : 0000000000000020
> [ 8.780369][ T1] x2 : 0000000000000000 x1 : f6ff000003850e38 x0 : f6ff000003850800
> [ 8.781301][ T1] Call trace:
> [ 8.781667][ T1] drmm_add_final_kfree+0x7c/0x98
> [ 8.782209][ T1] __devm_drm_dev_alloc+0xb4/0xd4
> [ 8.782692][ T1] vgem_init+0xac/0x140
> [ 8.783141][ T1] do_one_initcall+0x80/0x1c4
> [ 8.783614][ T1] kernel_init_freeable+0x1c8/0x290
> [ 8.784114][ T1] kernel_init+0x24/0x1e0
> [ 8.784556][ T1] ret_from_fork+0x10/0x20
> [ 8.785109][ T1] ---[ end trace 0000000000000000 ]---
>
> For what it's worth, here are the commands I used to boot qemu:
>
> $ cd /tmp
> $ wget -O - 'https://storage.googleapis.com/syzbot-assets/7153da9da559/Image-9ed22ae6.gz.xz' | unxz > Image-9ed22ae6
> $ wget -O - 'https://storage.googleapis.com/syzbot-assets/384ffdcca292/non_bootable_disk-9ed22ae6.raw.xz' | unxz > non_bootable_disk-9ed22ae6.raw
> $ qemu-system-aarch64 -machine virt,virtualization=on,mte=on,graphics=on,usb=on -cpu max -smp 1 -m 2048 -display none -serial stdio -drive file=/tmp/non_bootable_disk-9ed22ae6.raw,if=none,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -snapshot -kernel /tmp/Image-9ed22ae6
>
>
> I'll tag the report as follows, feel free to update.
>
> #syz set subsystems: arm, dri