Re: ip_rcv_finish() NULL pointer and possibly related Oopses

From: Daniel Borkmann
Date: Wed Sep 02 2015 - 20:12:54 EST


On 09/02/2015 06:39 PM, Shaun Crampton wrote:
Make sure you backported commit
10e2eb878f3ca07ac2f05fa5ca5e6c4c9174a27a
("udp: fix dst races with multicast early demux")

I just tried the latest CoreOS alpha, which had that patch. Sadly, I saw
just as many reboots. Here's a sample of the different types of Oopses I
see (I've put the rest up in a gist:
https://gist.github.com/fasaxc/d801ced5608f2657abd8):

[ 4024.564479] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 4024.565452] IP: [< (null)>] (null)
[ 4024.565452] PGD 2297067 PUD 2296067 PMD 0
[ 4024.565452] Oops: 0010 [#1] SMP
[ 4024.565452] Modules linked in: xt_mac xt_mark veth ip_set_hash_net
nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_set ip_set_hash_ip ip_set
nfnetlink ipip tunnel4 ip_tunnel ip6table_filter ip6_tables xt_conntrack
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter br_netfilter nf_nat
nf_conntrack bridge stp llc overlay nls_ascii nls_cp437 vfat fat ext4
crc16 mbcache jbd2 sd_mod crc32c_intel virtio_scsi scsi_mod aesni_intel
virtio_net mousedev aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd
microcode firmware_class virtio_pci virtio_ring psmouse virtio i2c_piix4
i2c_core acpi_cpufreq button evdev sch_fq_codel ip_tables autofs4
[ 4024.565452] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.6-coreos-r1 #2
[ 4024.565452] Hardware name: Google Google, BIOS Google 01/01/2011
[ 4024.565452] task: ffffffff81a154c0 ti: ffffffff81a00000 task.ti:
ffffffff81a00000
[ 4024.565452] RIP: 0010:[<0000000000000000>] [< (null)>]
(null)
[ 4024.565452] RSP: 0018:ffff88021fc03c00 EFLAGS: 00010246
[ 4024.565452] RAX: ffff880003375d00 RBX: ffff880003375d00 RCX:
0000000000000001
[ 4024.565452] RDX: ffff88000306c000 RSI: 0000000000000000 RDI:
ffff880003375d00
[ 4024.565452] RBP: ffff88021fc03c28 R08: 0000000000005608 R09:
000000000000bb84
[ 4024.565452] R10: 0000000000000003 R11: ffff880215a30dc0 R12:
ffff880214bfb000
[ 4024.565452] R13: ffff88000306c000 R14: ffff88000306c000 R15:
0000000000000008
[ 4024.565452] FS: 0000000000000000(0000) GS:ffff88021fc00000(0000)
knlGS:0000000000000000
[ 4024.565452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4024.565452] CR2: 0000000000000000 CR3: 0000000001d92000 CR4:
00000000001406f0
[ 4024.600761] Stack:
[ 4024.601081] ffffffff814ac9dc ffff880000000002 ffff88000306c000
ffff880003375d00
[ 4024.601081] ffff88008cbba84e ffff88021fc03c58 ffffffff81486628
ffff88021690a000
[ 4024.601081] ffff88008cbba84e ffff880003375d00 ffff88000306c000
ffff88021fc03cb8
[ 4024.601081] Call Trace:
[ 4024.601081] <IRQ>
[ 4024.601081] [<ffffffff814ac9dc>] ? tcp_v4_early_demux+0x11c/0x160
[ 4024.601081] [<ffffffff81486628>] ip_rcv_finish+0xb8/0x360
[ 4024.601081] [<ffffffff81486f84>] ip_rcv+0x2a4/0x400
[ 4024.601081] [<ffffffff81486570>] ? inet_del_offload+0x40/0x40
[ 4024.601081] [<ffffffff81449053>] __netif_receive_skb_core+0x6c3/0x9a0
[ 4024.601081] [<ffffffff8143b507>] ? build_skb+0x17/0x90
[ 4024.601081] [<ffffffff81449348>] __netif_receive_skb+0x18/0x60
[ 4024.601081] [<ffffffff814493c3>] netif_receive_skb_internal+0x33/0xa0
[ 4024.601081] [<ffffffff8144944c>] netif_receive_skb_sk+0x1c/0x70
[ 4024.601081] [<ffffffffa008772b>] 0xffffffffa008772b
[ 4024.601081] [<ffffffff81096cb0>] ? check_preempt_curr+0x80/0xa0
[ 4024.601081] [<ffffffffa0087d81>] 0xffffffffa0087d81

Looking at this one, I am still puzzeled where 0xffffffffa008772b and
0xffffffffa008772b comes from ... some driver, bridge ...? Also the call
to inet_del_offload() seems a bit odd. Even in 4.1, there's only one (buggy)
instance that calls inet_del_offload(), which is ipv6_exthdrs_offload_init(),
but IPPROTO_ROUTING shouldn't have much of an effect on the v4 table as
far as I can see. Maybe rather a false positive that address, hmm? Perhaps
some callback/infrastructure vanished underneath us as ip/rip is both null
... maybe due to that also 0xffffffffa008772b / 0xffffffffa008772b don't
resolve?

[ 4024.601081] [<ffffffff81449819>] net_rx_action+0x159/0x340
[ 4024.601081] [<ffffffff810715f4>] __do_softirq+0xf4/0x290
[ 4024.601081] [<ffffffff810719fd>] irq_exit+0xad/0xc0
[ 4024.601081] [<ffffffff815527fa>] do_IRQ+0x5a/0xf0
[ 4024.601081] [<ffffffff815506ae>] common_interrupt+0x6e/0x6e
[ 4024.601081] <EOI>
[ 4024.601081] [<ffffffff81059bd6>] ? native_safe_halt+0x6/0x10
[ 4024.601081] [<ffffffff8101f17e>] default_idle+0x1e/0xc0
[ 4024.601081] [<ffffffff8101fc5f>] arch_cpu_idle+0xf/0x20
[ 4024.601081] [<ffffffff810b0ab4>] cpu_startup_entry+0x314/0x3e0
[ 4024.601081] [<ffffffff8153bbec>] rest_init+0x7c/0x80
[ 4024.601081] [<ffffffff81b130e0>] start_kernel+0x483/0x490
[ 4024.601081] [<ffffffff81b12a4d>] ? set_init_arg+0x55/0x55
[ 4024.601081] [<ffffffff81b12120>] ? early_idt_handler_array+0x120/0x120
[ 4024.601081] [<ffffffff81b125ee>] x86_64_start_reservations+0x2a/0x2c
[ 4024.601081] [<ffffffff81b12728>] x86_64_start_kernel+0x138/0x147
[ 4024.601081] Code: Bad RIP value.
[ 4024.601081] RIP [< (null)>] (null)
[ 4024.601081] RSP <ffff88021fc03c00>
[ 4024.601081] CR2: 0000000000000000
[ 4024.601081] ---[ end trace cdabfe9d7380aaab ]---
[ 4024.601081] Kernel panic - not syncing: Fatal exception in interrupt
[ 4024.601081] Kernel Offset: disabled
[ 4024.601081] Rebooting in 60 seconds..
[ 4024.601081] ACPI MEMORY or I/O RESET_REG.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/