Re: WARNING in ovl_instantiate

From: Amir Goldstein
Date: Mon Dec 17 2018 - 08:30:29 EST

On Mon, Dec 17, 2018 at 12:47 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Sun, Dec 16, 2018 at 6:00 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Sat, Dec 15, 2018 at 9:34 PM syzbot
> > <syzbot+9c69c282adc4edd2b540@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > syzbot has found a reproducer for the following crash on:
> > >
> > > HEAD commit: d14b746c6c1c Add linux-next specific files for 20181214
> > > git tree: linux-next
> > > console output:
> > > kernel config:
> > > dashboard link:
> > > compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > > syz repro:
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+9c69c282adc4edd2b540@xxxxxxxxxxxxxxxxxxxxxxxxx
> > >
> > > overlayfs: filesystem on './file0' not supported as upperdir
> > > overlayfs: filesystem on './file0' not supported as upperdir
> > > overlayfs: filesystem on './file0' not supported as upperdir
> > > overlayfs: filesystem on './file0' not supported as upperdir
> > > overlayfs: filesystem on './file0' not supported as upperdir
> > > WARNING: CPU: 1 PID: 28918 at fs/overlayfs/dir.c:263
> > > ovl_instantiate+0x369/0x400 fs/overlayfs/dir.c:263
> >
> > Looks like some corner case race when using same dir as upper and lower.
> > Doesn't look like a critical issue, I just don't know how to explain
> > getting to this
> > state. Couldn't reproduce on my target machine.
> >
> > It would have been interesting for me to see the strace of the repro threads
> > when that WARN happens. I wonder if anyone else has already asked for it and
> > how hard would it be to make that information available with the bug report.
> Hi Amir,
> By strace you mean return values of syscalls, or something else?

I do mean return values.
Some of the commands in the repro are obviously going to fail and
some will fail conditionally depending on who wins the race.
It could have been good for analysis of the bug to know when the
race happened which syscall sequence took place.

> We had only 1 strace-related request, and it was related to better
> static decoding of inputs rather then dynamic behavior:
> I don't immediately see how to capture runtime behavior. It would work
> if we dump everything onto console right away. But this will produce
> tons of output (really lots). And that output will be intermixed
> across parallel processes. And it will be hard to understand which
> exactly syscalls participated in the process that provoked the crash.
> Or maybe it's exactly syscalls from several processes interacted. Lots
> of output can also slow down and perturb execution.

Yeh, I figured. Maybe the return values of syscalls is something that syzkaller
should cache and in case of failure, report recent run sequences in format
similar to the repro program. Just a though. Much easier said than done.

> But meanwhile I was able to reproduce this on the first run within 4
> minutes. Maybe you need to wait longer, it does not happen
> immediately.

Oh! I wonder if this type of information, how long or how many repeats before
crash happens is available in the bug report and I missed it - if not, could be
useful to add it.

Anyway, The reason that WARN_ON is there is because I wasn't sure
if that could happen. Apparently it can with this weird setup. Once I am able to
understand how it happens most likely the result will be to covert the WARN_ON
to pr_warn. User anyway gets an error, so there is probably nothing to
worry about
(famous last words).