RE: general protection fault in fib6_purge_rt
From: Jon Maloy
Date: Thu Mar 21 2019 - 09:55:49 EST
> -----Original Message-----
> From: netdev-owner@xxxxxxxxxxxxxxx <netdev-owner@xxxxxxxxxxxxxxx>
> On Behalf Of Xin Long
> Sent: 21-Mar-19 13:41
> To: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>; syzbot
> <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>;
> davem@xxxxxxxxxxxxx; kuznet@xxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; syzkaller-
> bugs@xxxxxxxxxxxxxxxx; tipc-discussion@xxxxxxxxxxxxxxxxxxxxx;
> ying.xue@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx
> Subject: Re: general protection fault in fib6_purge_rt
>
> On Thu, Mar 21, 2019 at 4:53 PM Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Xin Long <lucien.xin@xxxxxxxxx>
> > > Sent: 20-Mar-19 20:09
> > > To: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>; syzbot
> > > <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>;
> > > davem@xxxxxxxxxxxxx; kuznet@xxxxxxxxxxxxx; linux-
> > > kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; syzkaller-
> > > bugs@xxxxxxxxxxxxxxxx; tipc-discussion@xxxxxxxxxxxxxxxxxxxxx;
> > > ying.xue@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx
> > > Subject: Re: general protection fault in fib6_purge_rt
> > >
> > > On Thu, Mar 21, 2019 at 12:54 AM Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > > > > Sent: 20-Mar-19 17:41
> > > > > To: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > > > Cc: syzbot
> > > > > <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>;
> > > > > davem@xxxxxxxxxxxxx; kuznet@xxxxxxxxxxxxx; linux-
> > > > > kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; syzkaller-
> > > > > bugs@xxxxxxxxxxxxxxxx; tipc-discussion@xxxxxxxxxxxxxxxxxxxxx;
> > > > > ying.xue@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx
> > > > > Subject: Re: general protection fault in fib6_purge_rt
> > > > >
> > > > > On Wed, Mar 20, 2019 at 4:59 PM Jon Maloy
> > > > > <jon.maloy@xxxxxxxxxxxx>
> > > > > wrote:
> > > > > >
> > > > > > This one identifies the same culprit as
> > > > > syzbot+9d4c12bfd45a58738d0a@xxxxxxxxxxxxxxxxxxxxxxxxx, but
> > > > > syzbot+points to a
> > > > > different bug.
> > > > > > That bug has also been fixed, in commit adba75be0d23 ("tipc:
> > > > > > fix lockdep
> > > > > warning when reinitilaizing sockets"), applied in 4.20 but not
> > > > > present in 4.16, - the source of the dump.
> > > > > > Once again, a dump from 4.20/5.0 might be a help.
> > > Hi, Jon,
> > >
> > > I was running the reproducer against the net.git kernel which
> > > includes commit adba75be0d23.
> > >
> > > Another panic showed up:
> > >
> > > [ 156.086487]
> > >
> ==========================================================
> > > ========
> > > [ 156.088228] BUG: KASAN: use-after-free in
> > > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.089740] Read of size 8
> > > at addr ffff88802fdb1be8 by task swapper/1/0 [ 156.091120] [
> > > 156.091471] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0.test.syz
> > > #257 [ 156.092873] Hardware name: Red Hat KVM, BIOS
> > > seabios-1.7.5-8.el7 04/01/2014 [ 156.094315] Call Trace:
> > > [ 156.094844] <IRQ>
> > > [ 156.095306] dump_stack+0x7c/0xc0 [ 156.096040] ?
> > > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.097346]
> > > print_address_description+0x65/0x22e
> > > [ 156.098360] ? tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.099408] ?
> > > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.100445]
> > > kasan_report.cold.3+0x37/0x7a [ 156.101348] ?
> > > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.102402]
> > > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.103641] ?
> > > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.104830] ?
> > > __lock_is_held+0xb4/0x140 [ 156.105669] ? call_timer_fn+0xd1/0x610
> > > [ 156.106517] call_timer_fn+0x19a/0x610 [ 156.107342] ?
> > > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.108538] ?
> > > timer_fixup_init+0x30/0x30 [ 156.109411] ?
> > > _raw_spin_unlock_irq+0x29/0x40 [ 156.110343] ?
> > > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.111545] ?
> > > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.112749]
> > > run_timer_softirq+0xb51/0x1090 [ 156.113656] ?
> > > add_timer+0x8d0/0x8d0 [ 156.114433] ?
> kvm_sched_clock_read+0x14/0x30 [ 156.115355] ?
> > > sched_clock+0x5/0x10 [ 156.116124] __do_softirq+0x236/0xa1c [
> > > 156.116943] irq_exit+0x281/0x2d0 [ 156.117657]
> > > smp_apic_timer_interrupt+0x172/0x5d0
> > > [ 156.118658] apic_timer_interrupt+0xf/0x20
> > >
> > >
> > > I think it's caused by that d->timer wasn't deleted after the netns
> > > has been destroyed, and tipc_disc_timeout() still used d->net that has
> been freed.
> > >
> > > I looked at the __net_exit path, it should have been done by:
> > > tipc_exit_net() ->
> > > tipc_net_stop()->
> > > tipc_bearer_stop()->
> > > bearer_disable()->
> > > tipc_disc_delete()->
> > > del_timer_sync(&d->timer)
> > >
> > > but because of if (!self), it returned in tipc_net_stop().
> > >
> > > It seems to me that whether to do tipc_bearer/node_stop() for netns
> > > or not shouldn't depend on tipc_net(net)->node_addr.
> > > Can we just remove that if(!self) from tipc_net_stop() to fix it?
> >
> > That would probably work. Previous to the problematic commit, (!self)
> > just meant that we had never entered network mode, and that there was
> > nothing to stop or delete. That changed when this patch introduced the
> address negotiation period. So, if somebody leaves network mode before
> the hash address has been set, this will happen.
> But even previous to commit 52dfae5c85, if TIPC_NLA_NET_NODEID is set by
> netlink, tn->node_id will be set and tn->node_addr is still NULL.
> bear/nodes can be allocated in tipc_enable_bearer(), the panic would be
> triggered, right?
Yes. You are right.
>
> >
> > My concern is that we might run into surprises when we continue into the
> later functions, such as tipc_bearer_stop(), so I would prefer to avoid that.
> > The safer approach would be to now instead test for if
> (!tipc_own_id(net)), which now serves as a safe indicator if we have entered
> network node or not.
> okay, as long as no node/bear can be allocated when node_id is not set yet.
> :)
Yes, that is the case.
///jon
>
> >
> > > and also seems tipc_nametbl_stop() will do the clean job for
> > > nametbl, should
> > > tipc_nametbl_withdraw() also be removed from tipc_net_stop()?
> >
> > Yes. This looks like legacy from the previous implementation.
> >
> > ///jon
> >
> > >
> > > diff --git a/net/tipc/net.c b/net/tipc/net.c index f076edb..3647984
> > > 100644
> > > --- a/net/tipc/net.c
> > > +++ b/net/tipc/net.c
> > > @@ -163,12 +163,6 @@ void tipc_sched_net_finalize(struct net *net,
> > > u32
> > > addr)
> > >
> > > void tipc_net_stop(struct net *net) {
> > > - u32 self = tipc_own_addr(net);
> > > -
> > > - if (!self)
> > > - return;
> > > -
> > > - tipc_nametbl_withdraw(net, TIPC_CFG_SRV, self, self, self);
> > > rtnl_lock();
> > > tipc_bearer_stop(net);
> > > tipc_node_stop(net);
> > >
> > > > >
> > > > >
> > > > > Looking at the bisection log maybe this reproducer triggers
> > > > > multiple kernel bugs.
> > > >
> > > > I think so.
> > > >
> > > > > All crashes including the latest ones and other info are always
> > > > > available on the dashboard.
> > > >
> > > > Looking at the latest dashboard reports, I don't see anything that
> > > > points to
> > > TIPC.
> > > >
> > > > ///jon
> > > >
> > > >
> > > > >
> > > > >
> > > > > > ///jon
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: syzbot
> > > > > <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>
> > > > > > > Sent: 18-Mar-19 08:28
> > > > > > > To: davem@xxxxxxxxxxxxx; Jon Maloy
> <jon.maloy@xxxxxxxxxxxx>;
> > > > > > > kuznet@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > > > > netdev@xxxxxxxxxxxxxxx; syzkaller-bugs@xxxxxxxxxxxxxxxx;
> > > > > > > tipc- discussion@xxxxxxxxxxxxxxxxxxxxx;
> > > > > > > ying.xue@xxxxxxxxxxxxx;
> > > > > > > yoshfuji@linux- ipv6.org
> > > > > > > Subject: Re: general protection fault in fib6_purge_rt
> > > > > > >
> > > > > > > syzbot has bisected this bug to:
> > > > > > >
> > > > > > > commit 52dfae5c85a4c1078e9f1d5e8947d4a25f73dd81
> > > > > > > Author: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > > > > > Date: Thu Mar 22 19:42:52 2018 +0000
> > > > > > >
> > > > > > > tipc: obtain node identity from interface by default
> > > > > > >
> > > > > > > bisection log:
> > > > > https://syzkaller.appspot.com/x/bisect.txt?x=1116d2a3200000
> > > > > > > start commit: 52dfae5c tipc: obtain node identity from interface
> by
> > > > > defa..
> > > > > > > git tree: linux-next
> > > > > > > final crash:
> > > > > https://syzkaller.appspot.com/x/report.txt?x=1316d2a3200000
> > > > > > > console output:
> > > > > > > https://syzkaller.appspot.com/x/log.txt?x=1516d2a3200000
> > > > > > > kernel config:
> > > > > > > https://syzkaller.appspot.com/x/.config?x=c8b6073d992e8217
> > > > > > > dashboard link:
> > > > > > > https://syzkaller.appspot.com/bug?extid=a25307ad099309f1c2b9
> > > > > > > syz repro:
> > > > > https://syzkaller.appspot.com/x/repro.syz?x=16b2c56f200000
> > > > > > > C reproducer:
> > > > > https://syzkaller.appspot.com/x/repro.c?x=13b8890b200000
> > > > > > >
> > > > > > > Reported-by:
> > > > > > > syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > > > Fixes: 52dfae5c ("tipc: obtain node identity from interface
> > > > > > > by
> > > > > > > default")
> > > > > >
> > > > > > --
> > > > > > You received this message because you are subscribed to the
> > > > > > Google
> > > > > Groups "syzkaller-bugs" group.
> > > > > > To unsubscribe from this group and stop receiving emails from
> > > > > > it, send an
> > > > > email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxxx
> > > > > > To view this discussion on the web visit
> > > > > https://groups.google.com/d/msgid/syzkaller-
> > > > >
> > >
> bugs/BL0PR1501MB20039998B662DCC11E2B38D79A410%40BL0PR1501MB200
> > > > > 3.namprd15.prod.outlook.com.
> > > > > > For more options, visit https://groups.google.com/d/optout.