Re: [syzbot] kernel BUG in assertfail

From: Dmitry Vyukov
Date: Mon May 31 2021 - 06:32:22 EST


On Mon, May 31, 2021 at 11:27 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> >>>>>
> >>>>> syzbot found the following issue on:
> >>>>>
> >>>>> HEAD commit: 1434a312 Merge branch 'for-5.13-fixes' of git://git.kernel..
> >>>>> git tree: upstream
> >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=162843f3d00000
> >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=9f3da44a01882e99
> >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=a6bf271c02e4fe66b4e4
> >>>>>
> >>>>> Unfortunately, I don't have any reproducer for this issue yet.
> >>>>>
> >>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>>>> Reported-by: syzbot+a6bf271c02e4fe66b4e4@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>>>>
> >>>>> assertion failed: !memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE), in fs/btrfs/disk-io.c:3282
> >>>>
> >>>> This means a device contains a btrfs filesystem which has a different
> >>>> FSID in its superblock than the fsid which all devices part of the same
> >>>> fs_devices should have. This can happen in 2 ways - memory corruption
> >>>> where either of the ->fsid member are corrupted or if there was a crash
> >>>> while a filesystem's fsid was being changed. We need more context about
> >>>> what the test did?
> >>>
> >>> Hi Nikolay,
> >>>
> >>> From a semantic point of view we can consider that it just mounts /dev/random.
> >>> If syzbot comes up with a reproducer it will post it, but you seem to
> >>> already figure out what happened, so I assume you can write a unit
> >>> test for this.
> >>>
> >>
> >> Well no, under normal circumstances this shouldn't trigger. So if syzbot
> >> is doing something stupid as mounting /dev/random then I don't see a
> >> problem here. The assert is there to catch inconsistencies during normal
> >> operation which doesn't seem to be the case here.
> >
> >
> > Does this mean that CONFIG_BTRFS_ASSERT needs to be disabled in any testing?
> > What is it intended for? Or it can only be enabled when mounting known
> > good images? But then I assume even btrfs unit tests mount some
> > invalid images, so it would mean it can't be used even during unit
> > testing?
> >
> > Looking at the output of "grep ASSERT fs/btrfs/*.c" it looks like most
> > of these actually check for something that "must never happen". E.g.
> > some lists/pointers are empty/non-empty in particular states. And
> > "must never happen" checks are for testing scenarios...
> >
> > Taking this particular FSID mismatch assert, should such corrupted
> > images be mounted for end users? Should users be notified? Currently
> > they are mounted and users are not notified, what is the purpose of
> > this assertion?
> >
> > Perhaps CONFIG_BTRFS_ASSERT needs to be split into "must never happen"
> > checks that are enabled during testing and normal if's with pr_err for
> > user notifications?
>
> After going through the code you've convinced me. I just sent a patch
> turning the 2 debugging asserts into full-fledged checks in
> validate_super. So now the correct behavior is to prevent mounting of
> such images. How can I force syzbot to retest with the given patch applied?

syzbot can test patches for issues with reproducers:
http://bit.do/syzbot#testing-patches
but this issue doesn't have a reproducer unfortunately. But I hope
this change is going to be reasonably straightforward. And if/when
this issue happens again after this report is closed with a fix,
syzbot will notify us again. So an absence of any new reports from
syzbot will implicitly mean that everything is fine.