Re: general protection fault in fib6_purge_rt

From: Xin Long
Date: Thu Mar 21 2019 - 08:41:06 EST


On Thu, Mar 21, 2019 at 4:53 PM Jon Maloy <jon.maloy@xxxxxxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Xin Long <lucien.xin@xxxxxxxxx>
> > Sent: 20-Mar-19 20:09
> > To: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>; syzbot
> > <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>;
> > davem@xxxxxxxxxxxxx; kuznet@xxxxxxxxxxxxx; linux-
> > kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; syzkaller-
> > bugs@xxxxxxxxxxxxxxxx; tipc-discussion@xxxxxxxxxxxxxxxxxxxxx;
> > ying.xue@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx
> > Subject: Re: general protection fault in fib6_purge_rt
> >
> > On Thu, Mar 21, 2019 at 12:54 AM Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> > > > Sent: 20-Mar-19 17:41
> > > > To: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > > Cc: syzbot <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>;
> > > > davem@xxxxxxxxxxxxx; kuznet@xxxxxxxxxxxxx; linux-
> > > > kernel@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; syzkaller-
> > > > bugs@xxxxxxxxxxxxxxxx; tipc-discussion@xxxxxxxxxxxxxxxxxxxxx;
> > > > ying.xue@xxxxxxxxxxxxx; yoshfuji@xxxxxxxxxxxxxx
> > > > Subject: Re: general protection fault in fib6_purge_rt
> > > >
> > > > On Wed, Mar 20, 2019 at 4:59 PM Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > This one identifies the same culprit as
> > > > syzbot+9d4c12bfd45a58738d0a@xxxxxxxxxxxxxxxxxxxxxxxxx, but points to
> > > > syzbot+a
> > > > different bug.
> > > > > That bug has also been fixed, in commit adba75be0d23 ("tipc: fix
> > > > > lockdep
> > > > warning when reinitilaizing sockets"), applied in 4.20 but not
> > > > present in 4.16, - the source of the dump.
> > > > > Once again, a dump from 4.20/5.0 might be a help.
> > Hi, Jon,
> >
> > I was running the reproducer against the net.git kernel which includes
> > commit adba75be0d23.
> >
> > Another panic showed up:
> >
> > [ 156.086487]
> > ==========================================================
> > ========
> > [ 156.088228] BUG: KASAN: use-after-free in
> > tipc_disc_timeout+0x9c9/0xb20 [tipc]
> > [ 156.089740] Read of size 8 at addr ffff88802fdb1be8 by task swapper/1/0 [
> > 156.091120] [ 156.091471] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
> > 5.0.0.test.syz #257 [ 156.092873] Hardware name: Red Hat KVM, BIOS
> > seabios-1.7.5-8.el7 04/01/2014 [ 156.094315] Call Trace:
> > [ 156.094844] <IRQ>
> > [ 156.095306] dump_stack+0x7c/0xc0
> > [ 156.096040] ? tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.097346]
> > print_address_description+0x65/0x22e
> > [ 156.098360] ? tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.099408] ?
> > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.100445]
> > kasan_report.cold.3+0x37/0x7a [ 156.101348] ?
> > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.102402]
> > tipc_disc_timeout+0x9c9/0xb20 [tipc] [ 156.103641] ?
> > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.104830] ?
> > __lock_is_held+0xb4/0x140 [ 156.105669] ? call_timer_fn+0xd1/0x610 [
> > 156.106517] call_timer_fn+0x19a/0x610 [ 156.107342] ?
> > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.108538] ?
> > timer_fixup_init+0x30/0x30 [ 156.109411] ?
> > _raw_spin_unlock_irq+0x29/0x40 [ 156.110343] ?
> > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.111545] ?
> > tipc_disc_msg_xmit.isra.19+0x180/0x180 [tipc] [ 156.112749]
> > run_timer_softirq+0xb51/0x1090 [ 156.113656] ? add_timer+0x8d0/0x8d0 [
> > 156.114433] ? kvm_sched_clock_read+0x14/0x30 [ 156.115355] ?
> > sched_clock+0x5/0x10 [ 156.116124] __do_softirq+0x236/0xa1c [
> > 156.116943] irq_exit+0x281/0x2d0 [ 156.117657]
> > smp_apic_timer_interrupt+0x172/0x5d0
> > [ 156.118658] apic_timer_interrupt+0xf/0x20
> >
> >
> > I think it's caused by that d->timer wasn't deleted after the netns has been
> > destroyed, and tipc_disc_timeout() still used d->net that has been freed.
> >
> > I looked at the __net_exit path, it should have been done by:
> > tipc_exit_net() ->
> > tipc_net_stop()->
> > tipc_bearer_stop()->
> > bearer_disable()->
> > tipc_disc_delete()->
> > del_timer_sync(&d->timer)
> >
> > but because of if (!self), it returned in tipc_net_stop().
> >
> > It seems to me that whether to do tipc_bearer/node_stop() for netns or not
> > shouldn't depend on tipc_net(net)->node_addr.
> > Can we just remove that if(!self) from tipc_net_stop() to fix it?
>
> That would probably work. Previous to the problematic commit, (!self) just meant that we had never entered
> network mode, and that there was nothing to stop or delete. That changed when this patch introduced
> the address negotiation period. So, if somebody leaves network mode before the hash address has been set, this will happen.
But even previous to commit 52dfae5c85, if TIPC_NLA_NET_NODEID is set
by netlink, tn->node_id will be set and tn->node_addr is still NULL.
bear/nodes can be allocated in tipc_enable_bearer(), the panic would
be triggered, right?

>
> My concern is that we might run into surprises when we continue into the later functions, such as tipc_bearer_stop(), so I would prefer to avoid that.
> The safer approach would be to now instead test for if (!tipc_own_id(net)), which now serves as a safe indicator if we have entered network node or not.
okay, as long as no node/bear can be allocated when node_id is not set yet. :)

>
> > and also seems tipc_nametbl_stop() will do the clean job for nametbl, should
> > tipc_nametbl_withdraw() also be removed from tipc_net_stop()?
>
> Yes. This looks like legacy from the previous implementation.
>
> ///jon
>
> >
> > diff --git a/net/tipc/net.c b/net/tipc/net.c index f076edb..3647984 100644
> > --- a/net/tipc/net.c
> > +++ b/net/tipc/net.c
> > @@ -163,12 +163,6 @@ void tipc_sched_net_finalize(struct net *net, u32
> > addr)
> >
> > void tipc_net_stop(struct net *net)
> > {
> > - u32 self = tipc_own_addr(net);
> > -
> > - if (!self)
> > - return;
> > -
> > - tipc_nametbl_withdraw(net, TIPC_CFG_SRV, self, self, self);
> > rtnl_lock();
> > tipc_bearer_stop(net);
> > tipc_node_stop(net);
> >
> > > >
> > > >
> > > > Looking at the bisection log maybe this reproducer triggers multiple
> > > > kernel bugs.
> > >
> > > I think so.
> > >
> > > > All crashes including the latest ones and other info are always
> > > > available on the dashboard.
> > >
> > > Looking at the latest dashboard reports, I don't see anything that points to
> > TIPC.
> > >
> > > ///jon
> > >
> > >
> > > >
> > > >
> > > > > ///jon
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: syzbot
> > > > <syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx>
> > > > > > Sent: 18-Mar-19 08:28
> > > > > > To: davem@xxxxxxxxxxxxx; Jon Maloy <jon.maloy@xxxxxxxxxxxx>;
> > > > > > kuznet@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> > > > > > netdev@xxxxxxxxxxxxxxx; syzkaller-bugs@xxxxxxxxxxxxxxxx; tipc-
> > > > > > discussion@xxxxxxxxxxxxxxxxxxxxx; ying.xue@xxxxxxxxxxxxx;
> > > > > > yoshfuji@linux- ipv6.org
> > > > > > Subject: Re: general protection fault in fib6_purge_rt
> > > > > >
> > > > > > syzbot has bisected this bug to:
> > > > > >
> > > > > > commit 52dfae5c85a4c1078e9f1d5e8947d4a25f73dd81
> > > > > > Author: Jon Maloy <jon.maloy@xxxxxxxxxxxx>
> > > > > > Date: Thu Mar 22 19:42:52 2018 +0000
> > > > > >
> > > > > > tipc: obtain node identity from interface by default
> > > > > >
> > > > > > bisection log:
> > > > https://syzkaller.appspot.com/x/bisect.txt?x=1116d2a3200000
> > > > > > start commit: 52dfae5c tipc: obtain node identity from interface by
> > > > defa..
> > > > > > git tree: linux-next
> > > > > > final crash:
> > > > https://syzkaller.appspot.com/x/report.txt?x=1316d2a3200000
> > > > > > console output:
> > > > > > https://syzkaller.appspot.com/x/log.txt?x=1516d2a3200000
> > > > > > kernel config:
> > > > > > https://syzkaller.appspot.com/x/.config?x=c8b6073d992e8217
> > > > > > dashboard link:
> > > > > > https://syzkaller.appspot.com/bug?extid=a25307ad099309f1c2b9
> > > > > > syz repro:
> > > > https://syzkaller.appspot.com/x/repro.syz?x=16b2c56f200000
> > > > > > C reproducer:
> > > > https://syzkaller.appspot.com/x/repro.c?x=13b8890b200000
> > > > > >
> > > > > > Reported-by:
> > > > > > syzbot+a25307ad099309f1c2b9@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > > > Fixes: 52dfae5c ("tipc: obtain node identity from interface by
> > > > > > default")
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > > Groups "syzkaller-bugs" group.
> > > > > To unsubscribe from this group and stop receiving emails from it,
> > > > > send an
> > > > email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxxx
> > > > > To view this discussion on the web visit
> > > > https://groups.google.com/d/msgid/syzkaller-
> > > >
> > bugs/BL0PR1501MB20039998B662DCC11E2B38D79A410%40BL0PR1501MB200
> > > > 3.namprd15.prod.outlook.com.
> > > > > For more options, visit https://groups.google.com/d/optout.