Re: possible deadlock in dquot_commit

From: Dmitry Vyukov
Date: Fri Feb 12 2021 - 06:05:59 EST


On Thu, Feb 11, 2021 at 10:46 PM Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> On Thu, Feb 11, 2021 at 12:47:18PM +0100, Dmitry Vyukov wrote:
> > > This actually looks problematic: We acquired &ei->i_data_sem/2 (i.e.,
> > > I_DATA_SEM_QUOTA subclass) in ext4_map_blocks() called from
> > > ext4_block_write_begin(). This suggests that the write has been happening
> > > directly to the quota file (or that lockdep annotation of the inode went
> > > wrong somewhere). Now we normally protect quota files with IMMUTABLE flag
> > > so writing it should not be possible. We also don't allow clearing this
> > > flag on used quota file. Finally I'd checked lockdep annotation and
> > > everything looks correct. So at this point the best theory I have is that a
> > > filesystem has been suitably corrupted and quota file supposed to be
> > > inaccessible from userspace got exposed but I'd expect other problems to
> > > hit first in that case. Anyway without a reproducer I have no more ideas...
> >
> > There is a reproducer for 4.19 available on the dashboard. Maybe it will help.
> > I don't why it did not pop up on upstream yet, there lots of potential
> > reasons for this.
>
> The 4.19 version of the syzbot report has a very different stack
> trace. Instead of it being related to an apparent write to the quota
> file, it is apparently caused by a call to rmdir:
>
> dump_stack+0x22c/0x33e lib/dump_stack.c:118
> print_circular_bug.constprop.0.cold+0x2d7/0x41e kernel/locking/lockdep.c:1221
> ...
> __mutex_lock+0xd7/0x13f0 kernel/locking/mutex.c:1072
> dquot_commit+0x4d/0x400 fs/quota/dquot.c:469
> ext4_write_dquot+0x1f2/0x2a0 fs/ext4/super.c:5644
> ...
> ext4_evict_inode+0x933/0x1830 fs/ext4/inode.c:298
> evict+0x2ed/0x780 fs/inode.c:559
> iput_final fs/inode.c:1555 [inline]
> ...
> vfs_rmdir fs/namei.c:3865 [inline]
> do_rmdir+0x3af/0x420 fs/namei.c:3943
> __do_sys_unlinkat fs/namei.c:4105 [inline]
> __se_sys_unlinkat fs/namei.c:4099 [inline]
> __x64_sys_unlinkat+0xdf/0x120 fs/namei.c:4099
> do_syscall_64+0xf9/0x670 arch/x86/entry/common.c:293
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Which leads me to another apparent contradiction. Looking at the C
> reproducer source code, and running the C reproducer under "strace
> -ff", there is never any attempt to run rmdir() on the corrupted file
> system that is mounted. Neither as observed by my running the C
> reproducer, or by looking at the C reproducer source code.
>
> Looking at the code, I did see a number of things which seemed to be
> bugs; procid never gets incremented, so all of the threads only
> operate on /dev/loop0, and each call to the execute() function tries
> to setup two file systems on /dev/loop0. So the each thread to run
> creates a temp file, binds it to /dev/loop0, and then creates another
> temp file, tries to bind it to /dev/loop0 (which will fail), tries to
> mount /dev/loop0 (again) on the samee mount point (which will
> succeed).
>
> I'm not sure if this is just some insanity that was consed up by the
> fuzzer... or I'm wondering if this was an unfaithful translation of
> the syzbot repro to C. Am I correct in understanding that when syzbot
> is running, it uses the syzbot repro, and not the C repro?

Hi Ted,

The 4.19 reproducer may reproducer something else, you know better. I
just want to answer points re syzkaller reproducers. FTR the 4.19
reproducer/reproducer is here:
https://syzkaller.appspot.com/bug?id=b6cacc9fa48fea07154b8797236727de981c1e02

> there is never any attempt to run rmdir() on the corrupted file system that is mounted.

Recursive rmdir happens as part of test cleanup implicitly, you can
see rmdir call in remove_dir function in the C reproducer:
https://syzkaller.appspot.com/text?tag=ReproC&x=12caea37900000

> procid never gets incremented, so all of the threads only operate on /dev/loop0

This is intentional. procid is supposed to "isolate" parallel test
processes (if any). This reproducer does not use parallel test
processes, thus procid has constant value.

> Am I correct in understanding that when syzbot is running, it uses the syzbot repro, and not the C repro?

It tries both. If first tries to interpret "syzkaller program" as it
was done when the bug was triggered during fuzzing. But then it tries
to convert it to a corresponding stand-alone C program and confirms
that it still triggers the bug. If it provides a C reproducer, it
means that it did trigger the bug using this exact C program on a
freshly booted kernel (and the provided kernel oops is the
corresponding oops obtained on this exact program).
If it fails to reproduce the bug with a C reproducer, then it provides
only the "syzkaller program" to not mislead developers.