Re: BUG: stack guard page was hit in unwind_next_frame

From: Cong Wang
Date: Tue May 05 2020 - 03:02:30 EST


On Mon, May 4, 2020 at 6:06 PM Cong Wang <xiyou.wangcong@xxxxxxxxx> wrote:
>
> On Mon, May 4, 2020 at 12:08 PM Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> >
> > On Sat, May 02, 2020 at 11:36:11PM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: 8999dc89 net/x25: Fix null-ptr-deref in x25_disconnect
> > > git tree: net
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16004440100000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=e73ceacfd8560cc8a3ca
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+e73ceacfd8560cc8a3ca@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > Infinite loop in network code.
>
> It is not a loop, it is an unbound recursion where netdev events
> trigger between bond master and slave back and forth.
>
> Let me see how this can be fixed properly.

The following patch works for me, I think it is reasonable to stop
the netdev event propagation from upper to lower device, but I am
not sure whether this will miss the netdev event in complex
multi-layer setups.

diff --git a/net/core/dev.c b/net/core/dev.c
index 522288177bbd..ece50ae346c3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8907,7 +8907,7 @@ static void netdev_sync_lower_features(struct
net_device *upper,
netdev_dbg(upper, "Disabling feature %pNF on
lower dev %s.\n",
&feature, lower->name);
lower->wanted_features &= ~feature;
- netdev_update_features(lower);
+ __netdev_update_features(lower);

if (unlikely(lower->features & feature))
netdev_WARN(upper, "failed to disable
%pNF on %s!\n",