Re: [Cluster-devel] general protection fault in gfs2_withdraw
From: Bob Peterson
Date: Mon Sep 28 2020 - 09:52:18 EST
----- Original Message -----
> On 26/09/2020 18:21, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: 7c7ec322 Merge tag 'for-linus' of
> > git://git.kernel.org/pub..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11f2ff27900000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=6184b75aa6d48d66
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=50a8a9cf8127f2c6f5df
> > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/
> > c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=160fb773900000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1104f109900000
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the
> > commit:
> > Reported-by: syzbot+50a8a9cf8127f2c6f5df@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > gfs2: fsid=syz:syz.0: fatal: invalid metadata block
> > bh = 2072 (magic number)
> > function = gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line =
> > 417
> > gfs2: fsid=syz:syz.0: about to withdraw this file system
> > general protection fault, probably for non-canonical address
> > 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
> > CPU: 0 PID: 6842 Comm: syz-executor264 Not tainted 5.9.0-rc6-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:signal_our_withdraw fs/gfs2/util.c:97 [inline]
>
> Seems that it's withdrawing in the init_inodes() path early enough
> (while looking up the jindex) that sdp->sd_jdesc is still NULL here:
>
> static void signal_our_withdraw(struct gfs2_sbd *sdp)
> {
> struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
> struct inode *inode = sdp->sd_jdesc->jd_inode;
>
> I'm undecided as to whether the bug is that we're withdrawing that early
> at all, or that we're not checking for NULL there?
>
> Probably introduced by:
>
> 601ef0d52e96 gfs2: Force withdraw to replay journals and wait for it to
> finish
>
> Andy
Hi Andy. Thanks for your analysis.
I suspect you're right.
It's probably another exception to the rule. We knew there would be a few of
those with 601ef0d52e96, such as the one we made for "withdrawing during withdraw".
We should probably just add a check for NULL and make it do the right thing.
Regards,
Bob Peterson