Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

From: Jan Kara
Date: Tue Mar 14 2023 - 07:20:45 EST


On Mon 13-03-23 21:17:57, yebin (H) wrote:
> On 2023/3/13 21:01, Jan Kara wrote:
> > On Mon 13-03-23 20:27:34, yebin wrote:
> > > On 2023/3/13 19:57, Jan Kara wrote:
> > > > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > > > > On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > > > > > On 2/13/23 15:56, syzbot wrote:
> > > > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > > >
> > > > > > > > > HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > > > > > > > > git tree: upstream
> > > > > > > > > console output:
> > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > > > > > > kernel config:
> > > > > > > > > https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > > > > > > dashboard link:
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > > > > > > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > > > > > > for Debian) 2.35.2
> > > > > > > > > syz repro:
> > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > > > > > >
> > > > > > > > > Downloadable assets:
> > > > > > > > > disk image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > > > > > > vmlinux:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > > > > > > kernel image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > > > > > > mounted in repro:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > > > > > >
> > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the
> > > > > > > > > commit:
> > > > > > > > > Reported-by: syzbot+8785e41224a3afd04321@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > > > > >
> > > > > > > > > ==================================================================
> > > > > > > > > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > > > > > >
> > > > > > > > > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > > > > > > 6.2.0-rc8-syzkaller #0
> > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > BIOS Google 01/21/2023
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > > > > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > > > > > > print_address_description mm/kasan/report.c:306 [inline]
> > > > > > > > > print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > > > > > > kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > > > > > > crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > > ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > > > > > > ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > > > > > > ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > > > > > > ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > > > > > > ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > > > > > > ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > > > > > > ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > > > > > > ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > > > > > > ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > > > > > > ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > > > > > > evict+0x2a4/0x620 fs/inode.c:664
> > > > > > > > > do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > > > > > > __do_sys_unlink fs/namei.c:4368 [inline]
> > > > > > > > > __se_sys_unlink fs/namei.c:4366 [inline]
> > > > > > > > > __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > RIP: 0033:0x7fbc85a8c0f9
> > > > > > > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > > > > > > 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > > > > > > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > > > > > > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > > > > > > </TASK>
> > > > > > > > >
> > > > > > > > > The buggy address belongs to the physical page:
> > > > > > > > > page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > > > > > > mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > > > > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > > > > > > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > > > > > > 0000000000000000
> > > > > > > > > raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > > > > > > 0000000000000000
> > > > > > > > > page dumped because: kasan: bad access detected
> > > > > > > > > page_owner tracks the page as freed
> > > > > > > > > page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > > > > > > 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > > > > > > prep_new_page mm/page_alloc.c:2531 [inline]
> > > > > > > > > get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > > > > > > __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > > > > > > alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > > > > > > allocate_slab mm/slub.c:1998 [inline]
> > > > > > > > > new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > > > > > > ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > > > > > > __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > > > > > > kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > > > > > > mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > > > > > > mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > > > > > > mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > > > > > > mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > > > > > > vma_expand+0x277/0x850 mm/mmap.c:541
> > > > > > > > > mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > > > > > > do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > > > > > > vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > page last free stack trace:
> > > > > > > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > > > > free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > > > > > > free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > > > > > > free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > > > > > > free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > > > > > > qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > > > > > > kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > > > > > > __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > > > > > > kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > > > > > > slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > > > > > > slab_alloc_node mm/slub.c:3452 [inline]
> > > > > > > > > kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > > > > > > __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > > > > > > alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > > > > > > alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > > > > > > sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > > > > > > unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > > > > sock_sendmsg net/socket.c:734 [inline]
> > > > > > > > > __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > > > > > > __do_sys_sendto net/socket.c:2129 [inline]
> > > > > > > > > __se_sys_sendto net/socket.c:2125 [inline]
> > > > > > > > > __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > >
> > > > > > > > > Memory state around the buggy address:
> > > > > > > > > ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > > ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ^
> > > > > > > > > ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ==================================================================
> > > > > > > > >
> > > > > > > > I think the patch from below should fix it.
> > > > > > > >
> > > > > > > > I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > > > > > EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > > > > > super block in the buffer get corrupted sometime after the .get_tree
> > > > > > > > (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > > > > > relying on the contents of the buffer, we should instead rely on the
> > > > > > > > s_desc_size initialized at the __ext4_fill_super() time.
> > > > > > > >
> > > > > > > > If someone finds this good (or bad), or has a more in depth explanation,
> > > > > > > > please let me know, it will help me better understand the subsystem. In
> > > > > > > > the meantime I'll continue to investigate this and prepare a patch for
> > > > > > > > it.
> > > > > > > If there's something corrupting the superblock while the filesystem is
> > > > > > > mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > > > > try
> > > > > > > to paper over the problem by not using the on-disk data... Maybe journal
> > > > > > > replay is corrupting the value or something like that?
> > > > > > >
> > > > > > > Honza
> > > > > > >
> > > > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > > > doing at
> > > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > > > Will reply to this email thread once I understand what's happening. If
> > > > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > > >
> > > > > I can now explain how the contents of the super block of the buffer get
> > > > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > > > ("./bus"), then it starts overriding the data with something else, by
> > > > > using memcpy, memset, individual byte inits. Does that mean that we
> > > > > shouldn't rely on the contents of the super block in the buffer after we
> > > > > mount the file system? If so, then my patch stands. I'll be happy to
> > > > > extend it if needed. Below one may find a step by step interpretation of
> > > > > the reproducer.
> > > > >
> > > > > We have a strace log for the same bug, but on Android 5.15:
> > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > > >
> > > > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > > > commented them out in the c repro to confirm that they are not the
> > > > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > > > repro at:
> > > > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > > >
> > > > > Let's look at these calls, just before the bug was hit:
> > > > > [pid 328] open("./bus",
> > > > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > > > 000) = 4
> > > > > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > > > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > > > [pid 328] mmap(0x20000000, 6291456,
> > > > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > > > 5, 0) = 0x20000000
> > > > Yeah, looking at the reproducer, before this the reproducer also mounts
> > > > /dev/loop0 as ext4 filesystem.
> > > >
> > > > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > > > - /dev/loop0 is mounted to ./bus
> > > > > - then it creates a new file descriptor (5) for the same ./bus
> > > > > - then it creates a mapping for ./bus starting at offset zero. The
> > > > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > > > So the result is that the reproducer modified the block device while it is
> > > > mounted by the filesystem. We know cases like this can crash the kernel and
> > > > it is inherently difficult to fix. We have to trust the buffer cache
> > > > contents as otherwise the performance will be unacceptable. For historical
> > > > reasons we also have to allow modifications of buffer cache while ext4 is
> > > > mounted because tune2fs uses this to e.g. update the label of a mounted
> > > > filesystem.
> > > >
> > > > Long-term we are moving ext4 in a direction where we can disallow block
> > > > device modifications while the fs is mounted but we are not there yet. I've
> > > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > > developers and what seems plausible would be a kconfig option to disallow
> > > > writing to a block device when it is exclusively open by someone else.
> > > > But so far I didn't get to trying whether this would reasonably work. Would
> > > > you be interested in having a look into this?
> > > I am interested in this job. The file system is often damaged by writing
> > > block devices, which is a headache. I have always wanted to eradicate
> > > this kind of problem. A few months ago, I tried to add a mount parameter
> > > to prohibit modification after the block device is mounted.But I
> > > encountered several problems that led to the termination of my attempt.
> > > First of all, the 32-bit super block flags have been used up and need to
> > > be extended. Secondly, I don't know how to handle read-only flag in the
> > > case of multiple mount points.
> > > "disallow writing to a block device when it is exclusively open by someone
> > > else. "
> > > -> Perhaps we can add a new IOCTL command to control whether write
> > > operations are allowed after the block device has been exclusively
> > > opened. I don't know if this is feasible? Do you have any good
> > > suggestions?
> > Well, ioctl() for syzbot would be possible as well but for start I'd try
> > whether the idea with kconfig option will work. Then it will be enough to
> > just make sure all kernels used for fuzzing are built with this option set.
> > Thanks for having a look into this!
>
> In fact, I also want to solve the problem of file system damage caused by
> writing raw disks in the production environment. Use kconfig directly to
> control whether it loses flexibility in the production environment.

I see. But which protections do you exactly want in production? Since you
need to add somewhere the call to ioctl(2) to write-protect the device, you
could as well just "chmod ugo-w <device>" instead, couldn't you? And the
level of protection would be similar.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR