Re: [GIT] Networking

From: Dave Jones
Date: Mon Jun 23 2014 - 19:48:16 EST


On Mon, Jun 16, 2014 at 07:42:54PM -0400, Dave Jones wrote:
> On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote:
> > On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote:
> >
> > > 1) Fix checksumming regressions, from Tom Herbert.
> >
> > Something still not right for me here.
> > After about 5 minutes, I get an oops and then instant reboot/lock up.
> >
> > I haven't managed to get a trace over usb-serial because it seems to
> > crash before it completes. Hand transcribed one looks like..
> >
> > rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000
> > r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80
> > r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82
> > fs: 0 fs: ffff880236400000 knlGS: 0
> > CS: 10 DS: 0 ES: 0 CR0: 80050033
> > CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0
> > Stack:
> > ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8
> > ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680
> > 0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e
> > Call Trace:
> > <IRQ>
> > csum_partial
> > tcp_gso_segment
> > inet_gso_segment
> > ? update_dl_migration
> > skb_mac_gso_segment
> > __skb_gso_segment
> > dev_hard_start_xmit
> > sch_direct_xmit
> > __dev_queue_xmit
> > ? dev_hard_start_xmit
> > dev_queue_xmit
> > ip_finish_output
> > ? ip_output
> > ip_output
> > ip_forward_finish
> > ip_forward
> > ip_rcv_finish
> > ip_rcv
> > __netif_receive_skb_core
> > ? __netif_receive_skb_core
> > ? trace_hardirqs_on
> > __netif_receive_skb
> > netif_receive_skb_internal
> > napi_gro_complete
> > ? napi_gro_complete
> > dev_gro_receive
> > ? dev_gro_receive
> > napi_gro_receive
> > rtl8169_poll
> > net_rx_action
> > __do_softirq
> > irq_exit
> > do_IRQ
> > common_interrupt
> > <EOI>
> > cpuidle_enter_state
> > cpuidle_enter
> > cpu_startup_entry
> > rest_init
> > ? csum_partial_copy_generic
> > start_kernel
> > RIP: do_csum+0x83/0x180
> >
> > Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42
> > 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11
> > c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49
> >
> > All code
> > ========
> > 0: 41 89 d2 mov %edx,%r10d
> > 3: 74 45 je 0x4a
> > 5: 89 d1 mov %edx,%ecx
> > 7: 45 31 c0 xor %r8d,%r8d
> > a: 48 89 fa mov %rdi,%rdx
> > d: 0f 1f 00 nopl (%rax)
> > 10: 48 03 02 add (%rdx),%rax
> > 13: 48 13 42 08 adc 0x8(%rdx),%rax
> > 17: 48 13 42 10 adc 0x10(%rdx),%rax
> > 1b: 48 13 42 20 adc 0x20(%rdx),%rax
> > 1f: 48 13 42 28 adc 0x28(%rdx),%rax
> > 23: 48 13 42 30 adc 0x30(%rdx),%rax
> > 27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction
> > 2b: 4c 11 c0 adc %r8,%rax
> > 2e: 48 83 c2 40 add $0x40,%rdx
> > 32: 83 e9 01 sub $0x1,%ecx
> > 35: 75 d5 jne 0xc
> > 37: 41 83 ea 01 sub $0x1,%r10d
> > 3b: 49 rex.WB
> >
> > Typical, rdx and rax had scrolled off the screen.
>
> after removing the dump_stack invocations, I noticed that the reason
> this is rebooting is probably because right after the initial oops
> we hit the WARN_ON at arch/x86/kernel/smp.c:124
>
> if (unlikely(cpu_is_offline(cpu))) {
> WARN_ON(1);
> return;
> }
>
> lol.
>
> Anwyay, before all that nonsense, I now have the top of the oops..
>
> BUG: unable to handle kernel paging request at ffff880218c18000
> IP: do_csum+0x68
> PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060
> RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b
> RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680
>
> Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on.

This is still a problem in -rc2.
Lasts about 5 minutes, then reboots.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/