Re: bug in networking code causes GPF

From: Daniel Borkmann
Date: Thu Nov 27 2014 - 08:55:14 EST


On 11/27/2014 02:35 PM, ÐÐÐÐÑÐÐ-ÑÐÐÐÑÐÐ wrote:
hello,

i run ipvs DR on 2 servers under heavy load - up to 1Gbps of traffic.
Time to time the server where ipvs runs master IP (VIP) get general protection fault. Switching master to another server make no difference - after some time GPF come. So I assume it is not hardware issue.

There are logs from both servers with different kernels (i run kernel with grsecurity patch set from Gentoo hardened portage tree):

Hmm, looks pretty much like ...

http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/54903

... which was a bug in the grsec patch set.

Does your grsec kernel have:

commit 0fa213cce614ad25a79acbd06f37f1e9022134d9
Author: Brad Spengler <spender@xxxxxxxxxxxxxx>
Date: Fri Oct 31 17:29:20 2014 -0400

From: Mathias Krause <minipli@xxxxxxxxxxxxxx>
To: PaX Team <pageexec@xxxxxxxxxxx>
Cc: Brad Spengler <spender@xxxxxxxxxxxxxx>, Mathias Krause
<minipli@xxxxxxxxxxxxxx>
Subject: [PATCH] pax: don't sanitize RCU slab caches

We cannot sanitize SLAB_DESTROY_BY_RCU slab caches in kmem_cache_free()
as there might be readers in this RCU period, wanting to access the
object.

Fix this, for now, by marking those with SLAB_NO_SANITIZE. Hopefully we
can have a real fix later on. But this should fix the RCU stalls and
netfilter conntrack related problems.

This patch should go on top of the previous patch.

Signed-off-by: Mathias Krause <minipli@xxxxxxxxxxxxxx>

[354497.931834] general protection fault: 0000 [#1] SMP
[354497.931903] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 3.13.10-hardened.standart.20140515 #1
[354497.931993] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.5 11/25/2013
[354497.932082] task: ffff88021e4b2ca0 ti: ffff88021e4b3100 task.ti: ffff88021e4b3100
[354497.932167] RIP: 0010:[<ffffffff81653ca2>] [<ffffffff81653ca2>] ffffffff81653ca2
[354497.932278] RSP: 0000:ffff88021fd03b98 EFLAGS: 00010246
[354497.932330] RAX: 0000000000013ba0 RBX: fefefefefefefefe RCX: 000000000001bc30
[354497.932413] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[354497.932497] RBP: ffff88021fd03c40 R08: 00000000cacb7f0b R09: ffff88021fd03c58
[354497.932580] R10: ffffffffffffffff R11: ffff88041de33280 R12: 8000000000000000
[354497.932663] R13: 0000000000003786 R14: ffffffff81a82540 R15: 0000000000000000
[354497.932749] FS: 000003853a8a7740(0000) GS:ffff88021fd00000(0000) knlGS:0000000000000000
[354497.932836] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[354497.932891] CR2: 000003d8a933b2d0 CR3: 000000000174a000 CR4: 00000000000407f0
[354497.932973] Stack:
[354497.933013] 0000000000000000 ffffffff81a82540 00000000de1b1efe 0000000000000000
[354497.933110] ffff88021fd03c40 ffffffff81653f6d ffffffff81a92cc0 ffffffff81a82540
[354497.933206] ffff88041d70c500 0000000000000000 00000000de1b1efe ffffffff81654f6c
[354497.933304] Call Trace:
[354497.933347] <IRQ>
[354497.933357] [<ffffffff81653f6d>] ? __nf_conntrack_find_get+0x28/0x13b
[354497.933484] [<ffffffff81654f6c>] ? nf_conntrack_in+0x253/0x73e
[354497.933544] [<ffffffff8164eeb6>] ? nf_iterate+0x40/0x7d
[354497.933601] [<ffffffff816a90a4>] ? inet_del_offload+0x39/0x39
[354497.933658] [<ffffffff8164ef5f>] ? nf_hook_slow+0x6c/0x104
[354497.933714] [<ffffffff816a90a4>] ? inet_del_offload+0x39/0x39
[354497.933770] [<ffffffff816a98c8>] ? ip_rcv+0x313/0x35f
[354497.933824] [<ffffffff816a93d1>] ? ip_local_deliver_finish+0xb8/0x11f
[354497.933885] [<ffffffff81627dfd>] ? __netif_receive_skb_core+0x44d/0x4e2
[354497.933944] [<ffffffff8162afba>] ? netif_receive_skb+0x4c/0x81
[354497.934000] [<ffffffff8162b488>] ? napi_gro_receive+0x35/0x7a
[354497.934058] [<ffffffff81515ddc>] ? igb_poll+0xa49/0xd13
[354497.934115] [<ffffffff810ce5b1>] ? __wake_up+0x38/0x49
[354497.934169] [<ffffffff8162b773>] ? net_rx_action+0xa6/0x172
[354497.934225] [<ffffffff810a31cc>] ? __do_softirq+0xb9/0x1ae
[354497.934280] [<ffffffff810a3499>] ? irq_exit+0x37/0x7a
[354497.934335] [<ffffffff81003ce2>] ? do_IRQ+0x96/0xb0
[354497.934389] [<ffffffff81725a97>] ? common_interrupt+0x97/0x97
[354497.934441] <EOI>
[354497.934451] [<ffffffff810e3080>] ? update_ts_time_stats+0x30/0x76
[354497.934548] [<ffffffff81009d20>] ? arch_remove_reservations+0x6a/0x6a
[354497.934607] [<ffffffff81009d23>] ? default_idle+0x3/0x9
[354497.934676] [<ffffffff8100a333>] ? arch_cpu_idle+0x6/0x1e
[354497.934732] [<ffffffff81009d20>] ? arch_remove_reservations+0x6a/0x6a
[354497.934791] [<ffffffff810d434a>] ? cpu_startup_entry+0xe9/0x15b
[354497.934850] [<ffffffff81024ccf>] ? start_secondary+0x2f9/0x32c
[354497.934903] Code: c2 85 d2 49 8b 86 d0 04 00 00 74 14 66 45 85 ff 75 0e 65 ff 40 04 e8 85 f6 a4 ff 48 89 d8 eb 69 65 ff 00 48 8b 1b f6 c3 01 75 0f <8b> 43 10 39 45 00 b8 00 00 00 00 74 83 eb 9d 48 d1 eb 4c 39 eb
[354497.935402] RIP [<ffffffff81653ca2>] ffffffff81653ca2
[354497.935456] RSP <ffff88021fd03b98>
[354497.935965] ---[ end trace 7d6f660245b2d541 ]---
[354497.936080] Kernel panic - not syncing: Fatal exception in interrupt
[354498.016801] Rebooting in 10 seconds.


[674944.621564] general protection fault: 0000 [#1] SMP
[674944.621637] CPU: 12 PID: 17984 Comm: nginx Not tainted 3.15.10-hardened-r1.standart.20140925 #1
[674944.621728] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.5 11/25/2013
[674944.621817] task: ffff88021e1d7700 ti: ffff88021e1d7c68 task.ti: ffff88021e1d7c68
[674944.621903] RIP: 0010:[<ffffffff816f2be8>] [<ffffffff816f2be8>] ffffffff816f2be8
[674944.621990] RSP: 0000:ffff88021fc03ce8 EFLAGS: 00010246
[674944.622057] RAX: ffffc90011901000 RBX: 822098c2102098c2 RCX: 000000005823edca
[674944.622143] RDX: fefefefefefefefe RSI: 000000009e90f1ad RDI: ffffffff81a8ad40
[674944.622226] RBP: 000000000050abb3 R08: 000000000050abb3 R09: 000000000001f106
[674944.622310] R10: ffffea00100cbd80 R11: ffffea00100cbd80 R12: 8000000000000000
[674944.622394] R13: ffffffff81a8ad40 R14: 0000000049c3f106 R15: ffffc900119f9830
[674944.622479] FS: 0000029d6fd04740(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
[674944.622566] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[674944.622619] CR2: ffffffffff600400 CR3: 0000000001787000 CR4: 00000000000407f0
[674944.622701] Stack:
[674944.622741] ffffffff816e9360 ffffffff00000050 ffffffff822098c2 abb3000280000000
[674944.622839] ffff88006e9c2b00 ffff88011cbd1bce ffff88021e0c0000 0000000000000008
[674944.622935] ffffffff81a955b0 ffffffff8170920a ffff880100000003 0000000000000008
[674944.623031] Call Trace:
[674944.623077] <IRQ>
[674944.623087] [<ffffffff816e9360>] ? inet_del_offload+0x39/0x39
[674944.623192] [<ffffffff8170920a>] ? tcp_v4_early_demux+0x14c/0x1bd
[674944.623250] [<ffffffff816e93b0>] ? ip_rcv_finish+0x50/0x2c1
[674944.623326] [<ffffffff8165ee92>] ? __netif_receive_skb_core+0x3c8/0x456
[674944.623386] [<ffffffff8165f10c>] ? netif_receive_skb_internal+0x4c/0x81
[674944.623447] [<ffffffff816623b3>] ? napi_gro_receive+0x36/0x7c
[674944.623511] [<ffffffff815485a5>] ? igb_poll+0xa8b/0xd5b
[674944.623572] [<ffffffff810f7fda>] ? __note_gp_changes+0x31/0x61
[674944.623630] [<ffffffff816626cf>] ? net_rx_action+0xa6/0x172
[674944.623688] [<ffffffff810bc995>] ? __do_softirq+0xf6/0x1fb
[674944.623744] [<ffffffff810bcbf4>] ? irq_exit+0x38/0x7c
[674944.623798] [<ffffffff81003ce3>] ? do_IRQ+0xb3/0xce
[674944.623853] [<ffffffff81767217>] ? common_interrupt+0x97/0x97
[674944.623906] <EOI>
[674944.623917] Code: 6a d4 75 0e 48 39 5a c8 74 51 eb 06 3b 44 24 50 74 50 4c 89 4c 24 08 e8 e8 fe ff ff 4c 8b 4c 24 08 eb 83 48 8b 12 f6 c2 01 75 0b <44> 39 72 d0 75 f2 e9 75 ff ff ff 48 d1 ea 4c 39 ca 0f 85 64 ff
[674944.624456] RIP [<ffffffff816f2be8>] ffffffff816f2be8
[674944.624536] RSP <ffff88021fc03ce8>
[674944.625020] ---[ end trace 8035e2b5322bab00 ]---
[674944.625126] Kernel panic - not syncing: Fatal exception in interrupt
[674944.706563] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[674944.706711] Rebooting in 10 seconds.


[7523332.314991] general protection fault: 0000 [#1] SMP
[7523332.315078] CPU: 4 PID: 25432 Comm: nginx Not tainted 3.15.8-hardened.standart.20140901 #1
[7523332.315172] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 09/10/2012
[7523332.315266] task: ffff88041eb98000 ti: ffff88041eb98568 task.ti: ffff88041eb98568
[7523332.315355] RIP: 0010:[<ffffffff8168db79>] [<ffffffff8168db79>] ffffffff8168db79
[7523332.315446] RSP: 0018:ffff88021fa03bf8 EFLAGS: 00010246
[7523332.316983] RAX: 00000000000149c0 RBX: ffffffff81a8ac80 RCX: 00000000000011d5
[7523332.317070] RDX: 0000000000000000 RSI: 0000000000008ea8 RDI: ffffffff81a8acfe
[7523332.317187] RBP: ffff88021fa03c5c R08: 00000000b96542ae R09: ffff88021fa03c74
[7523332.317274] R10: 0000000000000002 R11: ffff880238b8ce00 R12: 8000000000000000
[7523332.317360] R13: fefefefefefefefe R14: 0000000000000000 R15: 0000000047567b68
[7523332.317448] FS: 0000031d200c5740(0000) GS:ffff88021fa00000(0000) knlGS:0000000000000000
[7523332.317538] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7523332.317594] CR2: 000004373dcef000 CR3: 0000000001779000 CR4: 00000000000007f0
[7523332.317679] Stack:
[7523332.317722] 0000000000000000 ffffffff81a8ac80 ffff880003e08200 0000000000000000
[7523332.317824] ffffffff81a9bf60 ffffffff8168ef87 ffffffff81a9bf60 ffffffff81a96970
[7523332.317925] 0000000047567b68 ffffffff81a96970 0000000281a90002 0000000000000014
[7523332.318026] Call Trace:
[7523332.318072] <IRQ>
[7523332.318085] [<ffffffff8168ef87>] ? nf_conntrack_in+0x2c1/0x846
[7523332.318199] [<ffffffff81688956>] ? nf_iterate+0x41/0x81
[7523332.318259] [<ffffffff816ea4b8>] ? inet_del_offload+0x39/0x39
[7523332.318321] [<ffffffff81688a0c>] ? nf_hook_slow+0x76/0x111
[7523332.318393] [<ffffffff816ea4b8>] ? inet_del_offload+0x39/0x39
[7523332.318453] [<ffffffff816eacf2>] ? ip_rcv+0x2f4/0x356
[7523332.318512] [<ffffffff81660173>] ? __netif_receive_skb_core+0x3d9/0x410
[7523332.318575] [<ffffffff8166039c>] ? netif_receive_skb_internal+0x6d/0x77
[7523332.318640] [<ffffffff816634c1>] ? napi_gro_receive+0x36/0x7c
[7523332.318702] [<ffffffff8154a30d>] ? igb_poll+0xa46/0xd09
[7523332.318762] [<ffffffff813bbd0d>] ? __list_add+0x1b/0x37
[7523332.318820] [<ffffffff816637d2>] ? net_rx_action+0xa0/0x171
[7523332.318882] [<ffffffff810bcb7a>] ? __do_softirq+0xf7/0x1fa
[7523332.318943] [<ffffffff8176a29c>] ? do_softirq_own_stack+0x1c/0x30
[7523332.318999] <EOI>
[7523332.319013] [<ffffffff810bcccb>] ? do_softirq+0x24/0x2c
[7523332.319112] [<ffffffff810bcd39>] ? __local_bh_enable_ip+0x66/0x74
[7523332.319174] [<ffffffff8172f029>] ? ipt_do_table+0x5c6/0x5f0
[7523332.319235] [<ffffffff81688956>] ? nf_iterate+0x41/0x81
[7523332.319293] [<ffffffff816ed488>] ? ip_options_rcv_srr+0x1c7/0x1c7
[7523332.319354] [<ffffffff81688a0c>] ? nf_hook_slow+0x76/0x111
[7523332.319412] [<ffffffff816ed488>] ? ip_options_rcv_srr+0x1c7/0x1c7
[7523332.319473] [<ffffffff816ee3a2>] ? __ip_local_out+0x64/0x6e
[7523332.319533] [<ffffffff8164f4a3>] ? __sk_dst_check+0x34/0x63
[7523332.319617] [<ffffffff816ee3be>] ? ip_local_out_sk+0x12/0x39
[7523332.319676] [<ffffffff816eea83>] ? ip_queue_xmit+0x2ab/0x2db
[7523332.319739] [<ffffffff81703a1e>] ? tcp_transmit_skb+0x6eb/0x735
[7523332.319801] [<ffffffff81704323>] ? tcp_write_xmit+0x82e/0x969
[7523332.319861] [<ffffffff816f7278>] ? tcp_sendpage+0x50b/0x5e4
[7523332.319923] [<ffffffff811845e9>] ? direct_splice_actor+0x49/0x49
[7523332.319986] [<ffffffff8171a807>] ? inet_sendpage+0xbc/0xe0
[7523332.320045] [<ffffffff8164eacc>] ? kernel_sendpage+0x49/0x59
[7523332.320104] [<ffffffff8164eb23>] ? sock_sendpage+0x47/0x53
[7523332.320163] [<ffffffff81184658>] ? pipe_to_sendpage+0x6f/0x7c
[7523332.320223] [<ffffffff81185aa8>] ? splice_from_pipe_feed+0x7f/0x10e
[7523332.320285] [<ffffffff811845e9>] ? direct_splice_actor+0x49/0x49
[7523332.320347] [<ffffffff81185c2e>] ? __splice_from_pipe+0x3a/0x6b
[7523332.320408] [<ffffffff81185dff>] ? splice_from_pipe+0x66/0x87
[7523332.320468] [<ffffffff811845e9>] ? direct_splice_actor+0x49/0x49
[7523332.320533] [<ffffffff811845df>] ? direct_splice_actor+0x3f/0x49
[7523332.320599] [<ffffffff811860f5>] ? splice_direct_to_actor+0xd3/0x18d
[7523332.320661] [<ffffffff811845a0>] ? generic_pipe_buf_nosteal+0xc/0xc
[7523332.320723] [<ffffffff81186249>] ? do_splice_direct+0x9a/0xb6
[7523332.320783] [<ffffffff8115e7f2>] ? do_sendfile+0x182/0x32a
[7523332.320856] [<ffffffff811602bd>] ? SyS_sendfile64+0x137/0x1bc
[7523332.320916] [<ffffffff81768f37>] ? system_call_fastpath+0x16/0x1b
[7523332.320972] Code: 00 02 00 00 48 c7 c7 4d db 68 81 65 ff 40 04 e8 71 f1 a2 ff 4d 85 ed 75 58 e9 94 01 00 00 65 ff 00 4d 8b 6d 00 41 f6 c5 01 75 18 <41> 8b 55 10 31 c0 39 55 00 41 8a 7d 37 0f 85 14 ff ff ff e9 e7
[7523332.321522] RIP [<ffffffff8168db79>] ffffffff8168db79
[7523332.321579] RSP <ffff88021fa03bf8>
[7523332.322094] ---[ end trace 0e21b79561002306 ]---
[7523332.322210] Kernel panic - not syncing: Fatal exception in interrupt

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/