Re: [syzbot] [xfs?] WARNING in xfs_bmap_extents_to_btree

From: Darrick J. Wong
Date: Thu Mar 30 2023 - 21:25:47 EST


On Fri, Mar 31, 2023 at 09:43:02AM +1100, Dave Chinner wrote:
> On Thu, Mar 30, 2023 at 10:52:37AM +0200, Aleksandr Nogikh wrote:
> > On Thu, Mar 30, 2023 at 3:27 AM 'Dave Chinner' via syzkaller-bugs
> > <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Mar 28, 2023 at 09:08:01PM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 1e760fa3596e Merge tag 'gfs2-v6.3-rc3-fix' of git://git.ke..
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=16f83651c80000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=acdb62bf488a8fe5
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0c383e46e9b4827b01b1
> > > > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/17229b6e6fe0/disk-1e760fa3.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/69b5d310fba0/vmlinux-1e760fa3.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/0c65624aace9/bzImage-1e760fa3.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+0c383e46e9b4827b01b1@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 1 PID: 24101 at fs/xfs/libxfs/xfs_bmap.c:660 xfs_bmap_extents_to_btree+0xe1b/0x1190
> > >
> > > Allocation got an unexpected ENOSPC when it was supposed to have a
> > > valid reservation for the space. Likely because of an inconsistency
> > > that had been induced into the filesystem where superblock space
> > > accounting doesn't exactly match the AG space accounting and/or the
> > > tracked free space.
> > >
> > > Given this is a maliciously corrupted filesystem image, this sort of
> > > warning is expected and there's probably nothing we can do to avoid
> > > it short of a full filesystem verification pass during mount.
> > > That's not a viable solution, so I think we should just ignore
> > > syzbot when it generates this sort of warning....
> >
> > If it's not a warning about a kernel bug, then WARN_ON should probably
> > be replaced by some more suitable reporting mechanism. Kernel coding
> > style document explicitly says:
> >
> > "WARN*() must not be used for a condition that is expected to trigger
> > easily, for example, by user space actions.
>
> That's exactly the case here. It should *never* happen in normal
> production workloads, and it if does then we have the *potential*
> for silent data loss occurring. That's *exactly* the sort of thing
> we should be warning admins about in no uncertain terms. Also, we
> use WARN_ON_ONCE(), so it's not going to spam the logs.
>
> syzbot is a malicious program - it is injecting broken stuff into
> the kernel as root to try to trigger situations like this. That
> doesn't make a warning it triggers bad or incorrect - syzbot is
> pertubing tightly coupled structures in a way that makes the
> information shared across those structures inconsistent and
> eventually the code is going to trip over that inconsistency.
>
> IOWs, once someone has used root permissions to mount a maliciously
> crafted filesystem image, *all bets are off*. The machine is running
> a potentially compromised kernel at this point. Hence it is almost
> guaranteed that at some point the kernel is going to discover things
> are *badly wrong* and start dumping "this should never happen!"
> warnings into the logs. That's what the warnings are supposed to do,
> and the fact that syzbot can trigger them doesn't make the warnings
> wrong.
>
> > pr_warn_once() is a
> > possible alternative, if you need to notify the user of a problem."
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?id=1e760fa3596e8c7f08412712c168288b79670d78#n1223
>
> It is worth remembering that those are guidelines, not enforcable
> rules and any experienced kernel developer will tell you the same
> thing. We know the guidelines, we know when to apply them, we know
> there are cases that the guidelines simply can't, don't or won't
> cover.

...and perhaps the WARNs that can result from corrupted metadata should
be changed to XFS_IS_CORRUPT() ?

We still get a kernel log about something going wrong, only now the
report doesn't trigger everyone's WARN triggers, and we tell the user to
go run xfs_repair.

--D

> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx