Re: [syzbot] [gfs2?] WARNING in filename_mkdirat

From: Andreas Gruenbacher

Date: Mon Feb 23 2026 - 10:37:16 EST

On Thu, Feb 19, 2026 at 10:45 PM NeilBrown <neilb@xxxxxxxxxxx> wrote:
> On Thu, 19 Feb 2026, Christian Brauner wrote:
> > On Wed, Feb 18, 2026 at 09:18:53AM +1100, NeilBrown wrote:
> > > On Tue, 17 Feb 2026, NeilBrown wrote:
> > > > On Tue, 17 Feb 2026, Christian Brauner wrote:
> > > > > On Mon, Feb 16, 2026 at 04:30:27PM -0800, syzbot wrote:
> > > > > > Hello,
> > > > > >
> > > > > > syzbot found the following issue on:
> > > > > >
> > > > > > HEAD commit: 0f2acd3148e0 Merge tag 'm68knommu-for-v7.0' of git://git.k..
> > > > > > git tree: upstream
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15331c02580000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=ac00553de86d6bf0
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0ea5108a1f5fb4fcc2d8
> > > > > > compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=146b295a580000
> > > > > >
> > > > > > Downloadable assets:
> > > > > > disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-0f2acd31.raw.xz
> > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/b7d134e71e9c/vmlinux-0f2acd31.xz
> > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/b18643058ceb/bzImage-0f2acd31.xz
> > > > > > mounted in repro: https://storage.googleapis.com/syzbot-assets/bbfed09077d3/mount_1.gz
> > > > > > fsck result: OK (log: https://syzkaller.appspot.com/x/fsck.log?x=106b295a580000)
> > > > > >
> > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > Reported-by: syzbot+0ea5108a1f5fb4fcc2d8@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > >
> > > > > Neil, is this something you have time to look into?
> > > >
> > > > The reproducer appears to mount a gfs2 filesystem and mkdir 3
> > > > directories:
> > > > ./file1
> > > > ./file1/file4
> > > > ./file1/file4/file7
> > > >
> > > > and somewhere in there it crashes because vfs_mkdir() returns a
> > > > non-error dentry for which ->d_parent->d_inode is not locked and
> > > > end_creating_path() tries to up_write().
> > > >
> > > > Presumably either ->d_parent has changed or the inode was unlocked?
> > > >
> > > > gfs2_mkdir() never returns a dentry, so it must be returning NULL.
> > > >
> > > > It's weird - but that is no surprise.
> > > >
> > > > I'll try building a kernel myself and see if the reproducer still fires.
> > > > if so some printk tracing my reveal something.
> > >
> > > Unfortunately that didn't work out.
> > > Using the provided vmlinux and root image and repro, and a syzkaller I
> > > compiled from current git, I cannot trigger the crash.
> > >
> > > I'll have another look at the code but I don't hold out a lot of hope.
> >
> > There's at least a proper C repro now.
> >
>
> Yes - and with the new C repro I can trigger the bug.
>
> The problem is in gfs2. gfs2_create_inode() calls d_instantiate()
> before unlock_new_inode(). This is bad. d_instantiate_new() should be
> used, which makes sure the two things happen in the correct order.
>
> Key to understanding the problem is knowing that unlock_new_inode()
> calls lockdep_annotate_inode_mutex_key() which (potentially) calls
> init_rwsem(&inode->i_rwsem);
>
> So if anyone has locked the inode before unlock_new_inode() is called,
> the lock is lost when i_rwsem is reinitialised.
>
> The reproducer calls mkdir("a") and mkdir("a/b") concurrently from
> separate threads. The second mkdir() often fails (I assume) because "a"
> cannot be found. But if that second mkdir() runs just after gfs2 has
> called d_instantiate(), then the lookup of "a" will succeed and so the
> inode will be locked ready for mkdir.. Then the mkdir("a") completes
> calling unlock_new_inode() which reinitialised i_rwsem. When
> mkdir("a/b") comes to lock the parent, it finds that it isn't locked any
> more.
>
> There is non-trivial code between the d_instantiate() call and the
> unlock_new_inode() call which I do not understand. So I will not
> propose a patch. I don't know if that code should be after
> d_instantiate_new(), or before it.
>
> So I'll leave that to Andreas.

Thanks a lot, Neil, I've added a fix to for-next.

Andreas