Re: [GIT] Networking

From: Dave Jones
Date: Mon Jun 16 2014 - 19:43:13 EST


On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote:
> On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote:
>
> > 1) Fix checksumming regressions, from Tom Herbert.
>
> Something still not right for me here.
> After about 5 minutes, I get an oops and then instant reboot/lock up.
>
> I haven't managed to get a trace over usb-serial because it seems to
> crash before it completes. Hand transcribed one looks like..
>
> rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000
> r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80
> r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82
> fs: 0 fs: ffff880236400000 knlGS: 0
> CS: 10 DS: 0 ES: 0 CR0: 80050033
> CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0
> Stack:
> ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8
> ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680
> 0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e
> Call Trace:
> <IRQ>
> csum_partial
> tcp_gso_segment
> inet_gso_segment
> ? update_dl_migration
> skb_mac_gso_segment
> __skb_gso_segment
> dev_hard_start_xmit
> sch_direct_xmit
> __dev_queue_xmit
> ? dev_hard_start_xmit
> dev_queue_xmit
> ip_finish_output
> ? ip_output
> ip_output
> ip_forward_finish
> ip_forward
> ip_rcv_finish
> ip_rcv
> __netif_receive_skb_core
> ? __netif_receive_skb_core
> ? trace_hardirqs_on
> __netif_receive_skb
> netif_receive_skb_internal
> napi_gro_complete
> ? napi_gro_complete
> dev_gro_receive
> ? dev_gro_receive
> napi_gro_receive
> rtl8169_poll
> net_rx_action
> __do_softirq
> irq_exit
> do_IRQ
> common_interrupt
> <EOI>
> cpuidle_enter_state
> cpuidle_enter
> cpu_startup_entry
> rest_init
> ? csum_partial_copy_generic
> start_kernel
> RIP: do_csum+0x83/0x180
>
> Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42
> 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11
> c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49
>
> All code
> ========
> 0: 41 89 d2 mov %edx,%r10d
> 3: 74 45 je 0x4a
> 5: 89 d1 mov %edx,%ecx
> 7: 45 31 c0 xor %r8d,%r8d
> a: 48 89 fa mov %rdi,%rdx
> d: 0f 1f 00 nopl (%rax)
> 10: 48 03 02 add (%rdx),%rax
> 13: 48 13 42 08 adc 0x8(%rdx),%rax
> 17: 48 13 42 10 adc 0x10(%rdx),%rax
> 1b: 48 13 42 20 adc 0x20(%rdx),%rax
> 1f: 48 13 42 28 adc 0x28(%rdx),%rax
> 23: 48 13 42 30 adc 0x30(%rdx),%rax
> 27:* 48 13 42 38 adc 0x38(%rdx),%rax <-- trapping instruction
> 2b: 4c 11 c0 adc %r8,%rax
> 2e: 48 83 c2 40 add $0x40,%rdx
> 32: 83 e9 01 sub $0x1,%ecx
> 35: 75 d5 jne 0xc
> 37: 41 83 ea 01 sub $0x1,%r10d
> 3b: 49 rex.WB
>
> Typical, rdx and rax had scrolled off the screen.

after removing the dump_stack invocations, I noticed that the reason
this is rebooting is probably because right after the initial oops
we hit the WARN_ON at arch/x86/kernel/smp.c:124

if (unlikely(cpu_is_offline(cpu))) {
WARN_ON(1);
return;
}

lol.

Anwyay, before all that nonsense, I now have the top of the oops..

BUG: unable to handle kernel paging request at ffff880218c18000
IP: do_csum+0x68
PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060
RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b
RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680

Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/