Re: [syzbot] WARNING in unsafe_follow_pfn

From: Dmitry Vyukov
Date: Tue Apr 13 2021 - 13:21:34 EST


On Thu, Apr 1, 2021 at 2:19 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Wed, Mar 31, 2021 at 07:29:22AM +0300, Dan Carpenter wrote:
> > On Tue, Mar 30, 2021 at 07:04:30PM +0200, Paolo Bonzini wrote:
> > > On 30/03/21 17:26, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 93129492 Add linux-next specific files for 20210326
> > > > git tree: linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=169ab21ad00000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=6f2f73285ea94c45
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=015dd7cdbbbc2c180c65
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=119b8d06d00000
> > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=112e978ad00000
> > > >
> > > > The issue was bisected to:
> > > >
> > > > commit d40b9fdee6dc819d8fc35f70c345cbe0394cde4c
> > > > Author: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > > > Date: Tue Mar 16 15:33:01 2021 +0000
> > > >
> > > > mm: Add unsafe_follow_pfn
> > > >
> > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=122d2016d00000
> > > > final oops: https://syzkaller.appspot.com/x/report.txt?x=112d2016d00000
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=162d2016d00000
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+015dd7cdbbbc2c180c65@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > Fixes: d40b9fdee6dc ("mm: Add unsafe_follow_pfn")
> > >
> > > This is basically intentional because get_vaddr_frames is broken, isn't it?
> > > I think it needs to be ignored in syzkaller.
> >
> > What?
> >
> > The bisect is wrong (because it's blaming the commit which added the
> > warning instead of the commit which added the buggy caller) but the
> > warning is correct.
> >
> > Plus users are going to be seeing this as well. According to the commit
> > message for 69bacee7f9ad ("mm: Add unsafe_follow_pfn") "Unfortunately
> > there's some users where this is not fixable (like v4l userptr of iomem
> > mappings)". It sort of seems crazy to dump this giant splat and then
> > tell users to ignore it forever because it can't be fixed... 0_0
>
> I think the discussion conclusion was that this interface should not
> be used by userspace anymore, it is obsolete by some new interface?
>
> It should be protected by some kconfig and the kconfig should be
> turned off for syzkaller runs.

If this is not a kernel bug, then it must not use WARN_ON[_ONCE]. It
makes the kernel untestable for both automated systems and humans:

https://lwn.net/Articles/769365/

<quote>
Greg Kroah-Hartman raised the problem of core kernel API code that
will use WARN_ON_ONCE() to complain about bad usage; that will not
generate the desired result if WARN_ON_ONCE() is configured to crash
the machine. He was told that the code should just call pr_warn()
instead, and that the called function should return an error in such
situations. It was generally agreed that any WARN_ON() or
WARN_ON_ONCE() calls that can be triggered from user space need to be
fixed.
</quote>


https://lore.kernel.org/netdev/20210413085522.2caee809@xxxxxxxxxxxxxxxxxx/
From: Steven Rostedt
<quote>

I agree. WARN_ON(_ONCE) should be reserved for anomalies that should not
happen ever. Anything that the user could trigger, should not trigger a
WARN_ON.

A WARN_ON is perfectly fine for detecting an accounting error inside the
kernel. I have them scattered all over my code, but they should never be
hit, even if something in user space tries to hit it. (with an exception of
an interface I want to deprecate, where I want to know if it's still being
used ;-) Of course, that wouldn't help bots testing the code. And I haven't
done that in years)

Any anomaly that can be triggered by user space doing something it should
not be doing really needs a pr_warn().
</quote>

And if it's a kernel bug reachable from user-space, then I think this
code should be removed entirely, not just on all testing systems. Or
otherwise if we are not removing it for some reason, then it needs to
be fixed.