Re: [syzbot] [ext4?] WARNING in ext4_file_write_iter
From: Pengfei Xu
Date: Sun Aug 06 2023 - 22:33:04 EST
Hi Brian,
On 2023-08-04 at 09:43:43 -0400, Brian Foster wrote:
> On Thu, Aug 03, 2023 at 09:37:30AM +0800, Pengfei Xu wrote:
> > On 2023-07-05 at 23:33:43 -0700, syzbot wrote:
> > > syzbot has found a reproducer for the following issue on:
> > >
> > > HEAD commit: 6843306689af Merge tag 'net-6.5-rc1' of git://git.kernel.o..
> > > git tree: net
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=114522aca80000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=7ad417033279f15a
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=5050ad0fb47527b1808a
> > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=102cb190a80000
> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17c49d90a80000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/f6adc10dbd71/disk-68433066.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/5c3fa1329201/vmlinux-68433066.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/84db3452bac5/bzImage-68433066.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+5050ad0fb47527b1808a@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 5382 at fs/ext4/file.c:611 ext4_dio_write_iter fs/ext4/file.c:611 [inline]
> > > WARNING: CPU: 1 PID: 5382 at fs/ext4/file.c:611 ext4_file_write_iter+0x1470/0x1880 fs/ext4/file.c:720
> > > Modules linked in:
> > > CPU: 1 PID: 5382 Comm: syz-executor288 Not tainted 6.4.0-syzkaller-11989-g6843306689af #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > > RIP: 0010:ext4_dio_write_iter fs/ext4/file.c:611 [inline]
> > > RIP: 0010:ext4_file_write_iter+0x1470/0x1880 fs/ext4/file.c:720
> > > Code: 84 03 00 00 48 8b 04 24 31 ff 8b 40 20 89 c3 89 44 24 10 83 e3 08 89 de e8 5d 5a 5b ff 85 db 0f 85 d5 fc ff ff e8 30 5e 5b ff <0f> 0b e9 c9 fc ff ff e8 24 5e 5b ff 48 8b 4c 24 40 4c 89 fa 4c 89
> > > RSP: 0018:ffffc9000522fc30 EFLAGS: 00010293
> > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: ffff8880277a3b80 RSI: ffffffff82298140 RDI: 0000000000000005
> > > RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8a832a60
> > > R13: 0000000000000000 R14: 0000000000000000 R15: fffffffffffffff5
> > > FS: 00007f154db95700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f154db74718 CR3: 000000006bcc7000 CR4: 00000000003506e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > > <TASK>
> > > call_write_iter include/linux/fs.h:1871 [inline]
> > > new_sync_write fs/read_write.c:491 [inline]
> > > vfs_write+0x981/0xda0 fs/read_write.c:584
> > > ksys_write+0x122/0x250 fs/read_write.c:637
> > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
> > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > RIP: 0033:0x7f154dc094f9
> > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > RSP: 002b:00007f154db952f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > RAX: ffffffffffffffda RBX: 00007f154dc924f0 RCX: 00007f154dc094f9
> > > RDX: 0000000000248800 RSI: 0000000020000000 RDI: 0000000000000006
> > > RBP: 00007f154dc5f628 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 652e79726f6d656d
> > > R13: 656c6c616b7a7973 R14: 6465646165726874 R15: 00007f154dc924f8
> > > </TASK>
> > >
> >
> > Above issue in dmesg is:
> > "WARNING: CPU: 1 PID: 5382 at fs/ext4/file.c:611 ext4_dio_write_iter fs/ext4/file.c:611 [inline]"
> >
> > I found the similar behavior issue:
> > "WARNING: CPU: 0 PID: 182134 at fs/ext4/file.c:611 ext4_dio_write_iter fs/ext4/file.c:611 [inline]"
> > repro.report shows similar details.
> >
> > Updated the bisect info for the above similar issue:
> > Bisected and the problem commit was:
> > "
> > 310ee0902b8d9d0a13a5a13e94688a5863fa29c2: ext4: allow concurrent unaligned dio overwrites
> > "
> > After reverted the commit on top of v6.5-rc3, this issue was gone.
> >
>
> Hi Pengfei,
>
> Thanks for narrowing this down (and sorry for missing the earlier
> report). Unfortunately I've not been able to reproduce this locally
> using the generated reproducer. I tried both running the test program on
> a local test vm as well as booting the generated disk image directly.
>
> That said, I have received another report of this warning that happens
> to be related to io_uring. The cause in that particular case is that
> io_uring sets IOCB_HIPRI, which iomap dio turns into
> REQ_POLLED|REQ_NOWAIT on the bio without necessarily having IOCB_NOWAIT
> set on the request. This means we can expect -EAGAIN returns from the
> storage layer without necessarily passing DIO_OVERWRITE_ONLY to iomap,
> which in turn basically means that the warning added by this commit is
> wrong.
>
> I did submit the test patch at the link [1] referenced below to syzbot
> to see if the OVERWRITE_ONLY flag is set and the results I got this
> morning only showed the original !IOCB_NOWAIT warning. So while I still
> do not know the source of the -EAGAIN in the syzbot test (and I would
> like to), this shows that the overwrite flag is not involved and thus
> the -EAGAIN is presumably unrelated to that logic.
>
> So in summary I think the right fix is to just remove the overwrite flag
> and warning from this ext4 codepath. It was always intended as an extra
> precaution to support the warning, and the latter is clearly wrong. I'll
> submit another test change in a separate mail just to see if syzbot
> finds anything else and plan to send a proper patch to the list. In the
> meantime, if you have any suggestions to help reproduce via the
> generated program, I'm still interested in trying to grok where that
> particular -EAGAIN comes from.. Thanks.
Thanks for your patch! I'm glad it's helpful.
It could not be reproduced immediately and it takes more than 200s to
reproduce this issue, anyway here is my reproduced steps.
As following link info for example, it's not my reproduced environment and it's
alsmost the same for issue reproducing:
https://lore.kernel.org/all/0000000000007faf0005fe4f14b9@xxxxxxxxxx/T/#mf1678f17b7b7c4b3cc73abef436aac68787397ae
-> disk image: https://storage.googleapis.com/syzbot-assets/f6adc10dbd71/disk-68433066.raw.xz
-> kernel image: https://storage.googleapis.com/syzbot-assets/84db3452bac5/bzImage-68433066.xz
I used below simple script to boot up vm:
"
#!/bin/bash
bzimage=$1
[[ -z "$bzimage" ]] && bzimage="./bzImage-68433066"
qemu-system-x86_64 \
-m 2G \
-smp 2 \
-kernel $bzimage \
-append "console=ttyS0 root=/dev/sda1 earlyprintk=serial net.ifnames=0 thunderbolt.dyndbg" \
-drive file=./disk-68433066.raw,format=raw \
-net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10023-:22 \
-cpu host \
-net nic,model=e1000 \
-enable-kvm \
-nographic \
2>&1 | tee vmd.log
"
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/
And then access to above vm and execute the reproduced binary to reproduce.
Sometimes it takes some time to reproduce the problem.
Hope the above information helps you to reproduce the problem next time.
I saw syzbot already verified your latest patch and it's passed.
https://syzkaller.appspot.com/x/log.txt?x=17939bc1a80000
Best Regards,
Thanks!
>
> Brian
>
> [1] https://syzkaller.appspot.com/x/patch.diff?x=109a7c96a80000
>
> > All information: https://github.com/xupengfe/syzkaller_logs/tree/main/230730_134501_ext4_file_write_iter
> > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/repro.c
> > repro.prog(syscall reproduced steps): https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/repro.prog
> > repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/repro.report
> > Bisect log: https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/bisect_info.log
> > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/kconfig_origin
> > Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230730_134501_ext4_file_write_iter/6eaae198076080886b9e7d57f4ae06fa782f90ef_dmesg.log
> >
> > Best Regards,
> > Thanks!
> >
> > >
> > > ---
> > > If you want syzbot to run the reproducer, reply with:
> > > #syz test: git://repo/address.git branch-or-commit-hash
> > > If you attach or paste a git patch, syzbot will apply it before testing.
> >
>