Re: WARNING: bad unlock balance in ovl_llseek

From: Dmitry Vyukov
Date: Mon Mar 02 2020 - 07:35:05 EST


On Mon, Mar 2, 2020 at 1:10 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > On Sun, Mar 1, 2020 at 9:13 PM syzbot
> > > <syzbot+66a9752fa927f745385e@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit: f8788d86 Linux 5.6-rc3
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13c5f8f9e00000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=5d2e033af114153f
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=66a9752fa927f745385e
> > > > compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=131d9a81e00000
> > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=14117a81e00000
> > > >
> > >
> > > Dmitry,
> > >
> > > There is something strange about the C repro.
> > > It passes an invalid address for the first arg of mount syscall:
> > >
> > > syscall(__NR_mount, 0x400000ul, 0x20000000ul, 0x20000080ul, 0ul,
> > > 0x20000100ul);
> > >
> > > With this address mount syscall returns -EFAULT on my system.
> > > I fixed this manually, but repro did not trigger the reported bug on my system.
> >
> > Hi Amir,
> >
> > This is not strange in the context of fuzzer, it's goal is to pass
> > random data. Generally if it says 0x400000ul, that's what it is, don't
> > fix it, or you are running a different program that may not reproduce
> > the bug. If syzbot attaches a reproducer, the bug was triggered by
> > precisely this program.
> >
>
> What's strange it that a bug in overlay code cannot be triggered if overlay
> isn't mounted and as it is the repro couldn't mount overlayfs at all, at
> lease with my kernel config.

Can it depend on kernel config? The bug was triggered by the program
provided somehow.

Separate question: why is it failing? Isn't src unused for overlayfs?
Where/how does vfs code look at src?

> The bounds check that causes mount failure is in vfs code, not in
> overlayfs code,
> so not sure what exactly went on there.
>
>
> > The reason why it passes non-pointers here is we think the src
> > argument of overlay mount is unused:
> > https://github.com/google/syzkaller/blob/4a4e0509de520c7139ca2b5606712cbadc550db2/sys/linux/filesystem.txt#L12
> > If it's not true, it needs to be fixed (or almost all overlay mounts
> > fail with EFAULT during fuzzing).
> >
> >
> > > > The bug was bisected to:
> > > >
> > > > commit b1f9d3858f724ed45b279b689fb5b400d91352e3
> > > > Author: Amir Goldstein <amir73il@xxxxxxxxx>
> > > > Date: Sat Dec 21 09:42:29 2019 +0000
> > > >
> > > > ovl: use ovl_inode_lock in ovl_llseek()
> > > >
> > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16ff3bede00000
> > > > final crash: https://syzkaller.appspot.com/x/report.txt?x=15ff3bede00000
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11ff3bede00000
> > > >
> > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > Reported-by: syzbot+66a9752fa927f745385e@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > Fixes: b1f9d3858f72 ("ovl: use ovl_inode_lock in ovl_llseek()")
> > > >
> > > > =====================================
> > > > WARNING: bad unlock balance detected!
> > > > 5.6.0-rc3-syzkaller #0 Not tainted
> > > > -------------------------------------
> > > > syz-executor194/8947 is trying to release lock (&ovl_i_lock_key[depth]) at:
> > > > [<ffffffff828b7835>] ovl_inode_unlock fs/overlayfs/overlayfs.h:328 [inline]
> > > > [<ffffffff828b7835>] ovl_llseek+0x215/0x2c0 fs/overlayfs/file.c:193
> > > > but there are no more locks to release!
> > > >
> > >
> > > This is strange. I don't see how that can happen nor how my change would
> > > have caused this regression. If anything, the lock chance may have brought
> > > a bug in stack file ops to light, but don't see the bug.
> > >
> > > The repro is multi-threaded but when I ran the repro, a single thread did:
> > > - open lower file (pre copy up)
> > > - lchown file (copy up)
> > > - llseek the open file (so llseek is on a temporary ovl_open_realfile())
> > >
> > > Perhaps when bug was triggered ops above were executed by different
> > > threads?
> >
> > Perfectly possible.
> >
> > > Dmitry, I may have asked this before - how hard would it be to attach an
> > > strace of the repro to a bug report?
> >
> > This is tracked in https://github.com/google/syzkaller/issues/197 but
> > no progress so far.
> > What exactly were the main pain points in this case? But note that
> > strace is not atomic with actual execution, so it may lead you down
> > even worse rabbit hole...
>
> Sure, but it can add more insight for analysis.
>
> Thanks,
> Amir.