tipc_node_calculate_timer div-by-zero

From: Tommi Rantala
Date: Mon Nov 27 2017 - 04:51:04 EST


Hi,

I'm seeing a rare TIPC div-by-zero crash in tipc_node_calculate_timer().

If I get it right, we're receiving a discovery packet while enabling a bearer. The bearer is not yet fully initialized, causing the discovery packet processing to use zero tolerance value, which then causes the div-by-zero in tipc_node_calculate_timer().

I can reproduce the issue easily by adding some sleep in tipc_enable_bearer(), and the machine always crashes after booting it up:

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 47ec121574ce..00bdd87cd614 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -327,12 +327,16 @@ static int tipc_enable_bearer(struct net *net, const char *name,
return -EINVAL;
}

+ pr_warn("HACK: sleeping 2 seconds!");
+ usleep_range(2000000, 2100000);
+
b->identity = bearer_id;
b->tolerance = m->tolerance;
b->window = m->window;


In 4.14 removing the latter tipc_rcv() call in tipc_udp_recv() [which was added in in commit c9b64d492b1f (tipc: add replicast peer discovery)] seems to fix the crash, but I do not really understand the code:

diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
index ecca64fc6a6f..371653acf1f6 100644
--- a/net/tipc/udp_media.c
+++ b/net/tipc/udp_media.c
@@ -371,9 +371,11 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
goto rcu_out;
}

+ /*
tipc_rcv(sock_net(sk), skb, b);
rcu_read_unlock();
return 0;
+ */

rcu_out:
rcu_read_unlock();



I have tested 4.4 and 4.14, and the bug is reproducible in both versions.

Comments?

-Tommi



[ 12.576885] tipc: Activated (version 2.0.0)
[ 12.577506] NET: Registered protocol family 30
[ 12.578212] tipc: Started in single node mode
[ 12.589907] Started in network mode
[ 12.590450] Own node address <1.1.1>, network identity 1
[ 12.592022] HACK: sleeping 2 seconds!
[ 12.668088] divide error: 0000 [#1] SMP
[ 12.670993] Modules linked in: tipc ip6_udp_tunnel udp_tunnel nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc isofs aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ata_piix sch_fq_codel autofs4
[ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-pc64-dirty #1
[ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
[ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
[ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
[ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
[ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
[ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
[ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
[ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
[ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
[ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
[ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 12.712627] Call Trace:
[ 12.713390] <IRQ>
[ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc]
[ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc]
[ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc]
[ 12.717396] ? minmax_running_min+0x2f/0x60
[ 12.718248] ? dst_alloc+0x4c/0xa0
[ 12.718964] ? tcp_ack+0xaf1/0x10b0
[ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
[ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc]
[ 12.721459] ? dst_alloc+0x4c/0xa0
[ 12.722130] udp_queue_rcv_skb+0x264/0x490
[ 12.722924] __udp4_lib_rcv+0x21e/0x990
[ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0
[ 12.724442] ? tcp_v4_rcv+0x958/0xa40
[ 12.725039] udp_rcv+0x1a/0x20
[ 12.725587] ip_local_deliver_finish+0x97/0x1d0
[ 12.726323] ip_local_deliver+0xaf/0xc0
[ 12.726959] ? ip_route_input_noref+0x19/0x20
[ 12.727689] ip_rcv_finish+0xdd/0x3b0
[ 12.728307] ip_rcv+0x2ac/0x360
[ 12.728839] __netif_receive_skb_core+0x6fb/0xa90
[ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0
[ 12.730274] __netif_receive_skb+0x1d/0x60
[ 12.730953] ? __netif_receive_skb+0x1d/0x60
[ 12.731637] netif_receive_skb_internal+0x37/0xd0
[ 12.732371] napi_gro_receive+0xc7/0xf0
[ 12.732920] receive_buf+0x3c3/0xd40
[ 12.733441] virtnet_poll+0xb1/0x250
[ 12.733944] net_rx_action+0x23e/0x370
[ 12.734476] __do_softirq+0xc5/0x2f8
[ 12.734922] irq_exit+0xfa/0x100
[ 12.735315] do_IRQ+0x4f/0xd0
[ 12.735680] common_interrupt+0xa2/0xa2
[ 12.736126] </IRQ>
[ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10
[ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
[ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
[ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
[ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
[ 12.741831] default_idle+0x2a/0x100
[ 12.742323] arch_cpu_idle+0xf/0x20
[ 12.742796] default_idle_call+0x28/0x40
[ 12.743312] do_idle+0x179/0x1f0
[ 12.743761] cpu_startup_entry+0x1d/0x20
[ 12.744291] start_secondary+0x112/0x120
[ 12.744816] secondary_startup_64+0xa5/0xa5
[ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
[ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
[ 12.748555] ---[ end trace 1399ab83390650fd ]---
[ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt
[ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 12.751215] Rebooting in 60 seconds..


[ 12.561265] tipc: Activated (version 2.0.0)
[ 12.561997] NET: Registered protocol family 30
[ 12.562686] tipc: Started in single node mode
[ 12.569696] Started in network mode
[ 12.570401] Own node address <1.1.1>, network identity 1
[ 12.592022] HACK: sleeping 2 seconds!
[ 13.792592] divide error: 0000 [#1] SMP
[ 13.793539] Modules linked in: tipc ...
[ 13.799203] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.101-dirty #9
[ 13.800235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 13.801531] task: ffff8802355ebc00 ti: ffff8802355f8000 task.ti: ffff8802355f8000
[ 13.803134] RIP: 0010:[<ffffffffa03b6252>] [<ffffffffa03b6252>] tipc_node_calculate_timer.isra.6+0x52/0x60 [tipc]
[ 13.804974] RSP: 0018:ffff88023fc838a0 EFLAGS: 00010246
[ 13.805806] RAX: 0000000000000000 RBX: ffff88023fc83996 RCX: 0000000000000000
[ 13.806826] RDX: 0000000000000000 RSI: ffff8802275fd454 RDI: 0000000000000000
[ 13.807890] RBP: ffff88023fc838b0 R08: 0000000000000001 R09: 0000000000000001
[ 13.808915] R10: 0000000000000000 R11: ffff8802275fd470 R12: ffff8802275fd460
[ 13.809865] R13: ffff8802275fd1d8 R14: 0000000000000000 R15: ffff8802275fd000
[ 13.810932] FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[ 13.812152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.813140] CR2: 00007fa9aac6b6e0 CR3: 00000000b2e96000 CR4: 00000000003406e0
[ 13.814257] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 13.815513] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 13.816727] Stack:
[ 13.817256] ffff88023258b400 0000000000000000 ffff88023fc83960 ffffffffa03b7102
[ 13.819080] ffffffff00000000 0000000000000000 ffffea000000edd8 0000000001001001
[ 13.820998] ffff880001001004 ffff880200000002 ffff8800b928b000 ffff8802275fcc00
[ 13.822880] Call Trace:
[ 13.823472] <IRQ>
[ 13.823869] [<ffffffffa03b7102>] tipc_node_check_dest+0x342/0x3f0 [tipc]
[ 13.825389] [<ffffffffa03af2db>] tipc_disc_rcv+0x14b/0x1d0 [tipc]
[ 13.826571] [<ffffffffa03b7ab1>] tipc_rcv+0x6d1/0xc40 [tipc]
[ 13.827619] [<ffffffff8207dde5>] ? wake_up_process+0x15/0x20
[ 13.828656] [<ffffffffa0288de0>] ? svc_xprt_do_enqueue+0x100/0x240 [sunrpc]
[ 13.829865] [<ffffffffa03be3c0>] ? tipc_udp_send_msg+0x240/0x240 [tipc]
[ 13.831033] [<ffffffffa03be401>] tipc_udp_recv+0x41/0x80 [tipc]
[ 13.832126] [<ffffffff82500c77>] udp_queue_rcv_skb+0x327/0x440
[ 13.833234] [<ffffffff825013c4>] __udp4_lib_rcv+0x154/0x6d0
[ 13.834290] [<ffffffff824cf321>] ? ip_route_input_noref+0x331/0xe20
[ 13.835410] [<ffffffff82501e4a>] udp_rcv+0x1a/0x20