Kernel 4.1.6 Panic due to slab corruption
From: Nikolay Borisov
Date: Mon Sep 07 2015 - 04:41:28 EST
Hello,
On one of our servers I've observed the a kernel pannic
happening with the following backtrace:
[654405.527070] BUG: unable to handle kernel paging request at 0000000000028001
[654405.527076] IP: [<ffffffff81182a59>] kmem_cache_alloc_node+0x99/0x1e0
[654405.527085] PGD 14bef58067 PUD 2ab358067 PMD 0
[654405.527089] Oops: 0000 [#11] SMP
[654405.527093] Modules linked in: xt_multiport tcp_diag inet_diag act_police cls_basic sch_ingress scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_pkttype xt_state veth openvswitch xt_owner xt_conntrack iptable_filter iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler mpt2sas scsi_transport_sas raid_class
[654405.527145] CPU: 14 PID: 32267 Comm: httpd Tainted: G D L 4.1.6-clouder1 #1
[654405.527147] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0 07/09/2013
[654405.527149] task: ffff88139d3b1ec0 ti: ffff8808eda14000 task.ti: ffff8808eda14000
[654405.527151] RIP: 0010:[<ffffffff81182a59>] [<ffffffff81182a59>] kmem_cache_alloc_node+0x99/0x1e0
[654405.527155] RSP: 0018:ffff88407fcc3a98 EFLAGS: 00210246
[654405.527156] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8814ce9acf80
[654405.527157] RDX: 00000000837ad864 RSI: 0000000000050200 RDI: 0000000000018ce0
[654405.527158] RBP: ffff88407fcc3af8 R08: ffff88407fcd8ce0 R09: ffffffffa033d990
[654405.527159] R10: ffff88058676fdd8 R11: 0000000000007b4a R12: ffff881fff807ac0
[654405.527161] R13: 0000000000028001 R14: 0000000000000001 R15: ffff881fff807ac0
[654405.527162] FS: 0000000000000000(0000) GS:ffff88407fcc0000(0063) knlGS:0000000055c832e0
[654405.527164] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[654405.527165] CR2: 0000000000028001 CR3: 0000001467b64000 CR4: 00000000000406e0
[654405.527166] Stack:
[654405.527167] 0000000000000000 0000000000000000 0000000000000000 ffff881ff2d05000
[654405.527170] ffff88407fcc3ae8 00050200812b5903 ffff88407fcc3ae8 00000000000001a2
[654405.527172] 0000000000000001 ffff88058676fc60 ffff88058676fe80 0000000000001800
[654405.527175] Call Trace:
[654405.527177] <IRQ>
[654405.527184] [<ffffffffa033d990>] ovs_flow_stats_update+0x110/0x160 [openvswitch]
[654405.527189] [<ffffffffa033ae74>] ovs_dp_process_packet+0x64/0xf0 [openvswitch]
[654405.527193] [<ffffffffa0345c60>] ? netdev_port_receive+0x110/0x110 [openvswitch]
[654405.527197] [<ffffffffa0345c60>] ? netdev_port_receive+0x110/0x110 [openvswitch]
[654405.527201] [<ffffffffa0344815>] ovs_vport_receive+0x85/0xb0 [openvswitch]
[654405.527207] [<ffffffff812c7636>] ? blk_mq_free_hctx_request+0x36/0x40
[654405.527209] [<ffffffff812c7671>] ? blk_mq_free_request+0x31/0x40
[654405.527214] [<ffffffff8100c2f9>] ? read_tsc+0x9/0x10
[654405.527220] [<ffffffff810b9f04>] ? ktime_get+0x54/0xc0
[654405.527225] [<ffffffff813cf577>] ? put_device+0x17/0x20
[654405.527227] [<ffffffffa0048a50>] ? tcf_act_police+0x150/0x210 [act_police]
[654405.527232] [<ffffffff8150cdc1>] ? tcf_action_exec+0x51/0xa0
[654405.527235] [<ffffffffa0011445>] ? basic_classify+0x75/0xe0 [cls_basic]
[654405.527237] [<ffffffff815091d5>] ? tc_classify+0x55/0xc0
[654405.527241] [<ffffffffa0345bed>] netdev_port_receive+0x9d/0x110 [openvswitch]
[654405.527245] [<ffffffffa0345c94>] netdev_frame_hook+0x34/0x50 [openvswitch]
[654405.527250] [<ffffffff814e58e6>] __netif_receive_skb_core+0x206/0x880
[654405.527252] [<ffffffff814e5f87>] __netif_receive_skb+0x27/0x70
[654405.527254] [<ffffffff814e60c1>] process_backlog+0xf1/0x1b0
[654405.527257] [<ffffffff814e68d3>] napi_poll+0xd3/0x1c0
[654405.527259] [<ffffffff814e6a50>] net_rx_action+0x90/0x1c0
[654405.527264] [<ffffffff810595ab>] __do_softirq+0xfb/0x2a0
[654405.527270] [<ffffffff815b269c>] do_softirq_own_stack+0x1c/0x30
[654405.527271] <EOI>
[654405.527273] [<ffffffff810590b5>] do_softirq+0x55/0x60
[654405.527276] [<ffffffff81059198>] __local_bh_enable_ip+0x88/0x90
[654405.527279] [<ffffffff8152b062>] ip_finish_output+0x282/0x490
[654405.527281] [<ffffffff8152b55b>] ip_output+0xab/0xc0
[654405.527283] [<ffffffff8152ade0>] ? ip_finish_output_gso+0x4e0/0x4e0
[654405.527285] [<ffffffff815296fb>] ip_local_out_sk+0x3b/0x50
[654405.527287] [<ffffffff81529e0e>] ip_queue_xmit+0x14e/0x3c0
[654405.527291] [<ffffffff815422d2>] tcp_transmit_skb+0x4c2/0x850
[654405.527294] [<ffffffff81544c1d>] tcp_write_xmit+0x19d/0x670
[654405.527298] [<ffffffff812f32d1>] ? copy_user_generic_string+0x31/0x40
[654405.527300] [<ffffffff81545cd2>] __tcp_push_pending_frames+0x32/0xd0
[654405.527302] [<ffffffff81532911>] tcp_push+0xf1/0x120
[654405.527304] [<ffffffff815361f3>] tcp_sendmsg+0x373/0xb60
[654405.527307] [<ffffffff811be0b3>] ? mntput+0x23/0x40
[654405.527310] [<ffffffff811a7c32>] ? path_put+0x22/0x30
[654405.527315] [<ffffffff81561272>] inet_sendmsg+0x42/0xb0
[654405.527317] [<ffffffff81182e4e>] ? kmem_cache_alloc+0xee/0x1c0
[654405.527321] [<ffffffff814c639d>] sock_sendmsg+0x4d/0x60
[654405.527324] [<ffffffff814c64a6>] sock_write_iter+0xb6/0x100
[654405.527328] [<ffffffff8119d9d0>] do_iter_readv_writev+0x60/0x90
[654405.527330] [<ffffffff814c63f0>] ? kernel_sendmsg+0x40/0x40
[654405.527332] [<ffffffff8119e354>] compat_do_readv_writev+0x174/0x1f0
[654405.527337] [<ffffffff810aa6d9>] ? rcu_eqs_exit+0x79/0xb0
[654405.527339] [<ffffffff810aa723>] ? rcu_user_exit+0x13/0x20
[654405.527342] [<ffffffff8119e591>] compat_SyS_writev+0xc1/0x110
[654405.527346] [<ffffffff811274a3>] ? context_tracking_user_enter+0x13/0x20
[654405.527349] [<ffffffff815b2fc5>] sysenter_dispatch+0x7/0x25
[654405.527350] Code: 8b 00 48 c1 e8 38 41 39 c6 74 17 4c 89 c9 44 89 f2 8b 75 cc 4c 89 e7 e8 46 f6 ff ff 49 89 c5 eb 2b 90 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
[654405.527378] RIP [<ffffffff81182a59>] kmem_cache_alloc_node+0x99/0x1e0
[654405.527381] RSP <ffff88407fcc3a98>
[654405.527383] CR2: 0000000000028001
Before this occurs there are also several more "can't handle paging requests" e.g:
[654405.518482] BUG: unable to handle kernel paging request at 0000000000028001
[654405.518488] IP: [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.518496] PGD 364da24067 PUD 3733ae2067 PMD 0
[654405.518501] Oops: 0000 [#10] SMP
[654405.518504] Modules linked in: xt_multiport tcp_diag inet_diag act_police cls_basic sch_ingress scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_pkttype xt_state veth openvswitch xt_owner xt_conntrack iptable_filter iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler mpt2sas scsi_transport_sas raid_class
[654405.518555] CPU: 14 PID: 15732 Comm: guardian Tainted: G D L 4.1.6-clouder1 #1
[654405.518557] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0 07/09/2013
[654405.518559] task: ffff88373303e680 ti: ffff88369b388000 task.ti: ffff88369b388000
[654405.518560] RIP: 0010:[<ffffffff811824e5>] [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.518564] RSP: 0018:ffff88369b38bb48 EFLAGS: 00010282
[654405.518565] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[654405.518567] RDX: 00000000837ad864 RSI: 00000000000000d0 RDI: 0000000000018ce0
[654405.518568] RBP: ffff88369b38bb88 R08: ffff88407fcd8ce0 R09: ffffffff811c272c
[654405.518569] R10: ffff88369b38bb74 R11: ffff881f7c678db8 R12: ffff881fff807ac0
[654405.518570] R13: 0000000000028001 R14: ffff881fff807ac0 R15: 00000000000000d0
[654405.518572] FS: 00002b784bf66800(0000) GS:ffff88407fcc0000(0000) knlGS:0000000000000000
[654405.518574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[654405.518575] CR2: 0000000000028001 CR3: 000000364d574000 CR4: 00000000000406e0
[654405.518576] Stack:
[654405.518578] 000000013a481c58 0000000000000020 ffff883600010000 ffff88245528ca00
[654405.518580] ffffffff8120bc50 ffff881a3d3433c8 ffff88245528ca10 ffffffff81209ed0
[654405.518583] ffff88369b38bbc8 ffffffff811c272c ffff88245528ca10 0000000000000000
[654405.518586] Call Trace:
[654405.518593] [<ffffffff8120bc50>] ? proc_pid_follow_link+0x80/0x80
[654405.518596] [<ffffffff81209ed0>] ? sched_autogroup_open+0x50/0x50
[654405.518601] [<ffffffff811c272c>] single_open+0x3c/0xb0
[654405.518603] [<ffffffff81209eeb>] proc_single_open+0x1b/0x20
[654405.518606] [<ffffffff8119b69a>] do_dentry_open+0x22a/0x350
[654405.518608] [<ffffffff8119b809>] vfs_open+0x49/0x50
[654405.518612] [<ffffffff811ae652>] do_last+0x412/0x890
[654405.518615] [<ffffffff81182e4e>] ? kmem_cache_alloc+0xee/0x1c0
[654405.518620] [<ffffffff8129d6b6>] ? security_file_alloc+0x16/0x20
[654405.518623] [<ffffffff811aeb62>] path_openat+0x92/0x470
[654405.518626] [<ffffffff811ac753>] ? user_path_at_empty+0x63/0xa0
[654405.518628] [<ffffffff811aef8a>] do_filp_open+0x4a/0xa0
[654405.518633] [<ffffffff812fb140>] ? find_next_zero_bit+0x10/0x20
[654405.518637] [<ffffffff811bb64c>] ? __alloc_fd+0xac/0x150
[654405.518640] [<ffffffff8119ce9a>] do_sys_open+0x11a/0x230
[654405.518644] [<ffffffff8101190e>] ? syscall_trace_enter_phase1+0x14e/0x160
[654405.518650] [<ffffffff811274a3>] ? context_tracking_user_enter+0x13/0x20
[654405.518652] [<ffffffff8119cfee>] SyS_open+0x1e/0x20
[654405.518656] [<ffffffff815b0bee>] system_call_fastpath+0x12/0x71
[654405.518658] Code: 08 65 4c 03 05 5d 7c e8 7e 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 8c 00 00 00 48 85 c0 0f 84 83 00 00 00 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
[654405.518686] RIP [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.518689] RSP <ffff88369b38bb48>
[654405.518690] CR2: 0000000000028001
[654405.511613] BUG: unable to handle kernel paging request at 0000000000028001
[654405.511619] IP: [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.511628] PGD 3f9a016067 PUD 3ee598c067 PMD 0
[654405.511632] Oops: 0000 [#9] SMP
[654405.511634] Modules linked in: xt_multiport tcp_diag inet_diag act_police cls_basic sch_ingress scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_pkttype xt_state veth openvswitch xt_owner xt_conntrack iptable_filter iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler mpt2sas scsi_transport_sas raid_class
[654405.511684] CPU: 14 PID: 14914 Comm: templar.pl Tainted: G D L 4.1.6-clouder1 #1
[654405.511687] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0 07/09/2013
[654405.511689] task: ffff881f46d8bd80 ti: ffff883ee583c000 task.ti: ffff883ee583c000
[654405.511690] RIP: 0010:[<ffffffff811824e5>] [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.511694] RSP: 0018:ffff883ee583fe38 EFLAGS: 00010282
[654405.511695] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff881f3e1f8540
[654405.511697] RDX: 00000000837ad864 RSI: 00000000000080d0 RDI: 0000000000018ce0
[654405.511698] RBP: ffff883ee583fe78 R08: ffff88407fcd8ce0 R09: ffffffff8129028f
[654405.511699] R10: 0000000000000008 R11: 0000000000000246 R12: ffff881fff807ac0
[654405.511701] R13: 0000000000028001 R14: ffff881fff807ac0 R15: 00000000000080d0
[654405.511703] FS: 00002b06256163a0(0000) GS:ffff88407fcc0000(0000) knlGS:0000000000000000
[654405.511704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[654405.511706] CR2: 0000000000028001 CR3: 0000003f520c4000 CR4: 00000000000406e0
[654405.511707] Stack:
[654405.511708] 0000000100000404 0000000000000020 ffff883ee583fe78 0000000000000000
[654405.511711] 0000000000001000 0000000000000001 0000000000018003 0000000000000001
[654405.511715] ffff883ee583ff28 ffffffff8129028f 0000000000000001 00000000000007d0
[654405.511717] Call Trace:
[654405.511726] [<ffffffff8129028f>] do_shmat+0x22f/0x4a0
[654405.511729] [<ffffffff8129051c>] SyS_shmat+0x1c/0x30
[654405.511734] [<ffffffff815b0bee>] system_call_fastpath+0x12/0x71
[654405.511736] Code: 08 65 4c 03 05 5d 7c e8 7e 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 8c 00 00 00 48 85 c0 0f 84 83 00 00 00 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
[654405.511763] RIP [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.511765] RSP <ffff883ee583fe38>
[654405.511766] CR2: 0000000000028001
[654405.502947] BUG: unable to handle kernel paging request at 0000000000028001
[654405.502952] IP: [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.502961] PGD 1c7d1ba067 PUD 1d7c06d067 PMD 0
[654405.502965] Oops: 0000 [#8] SMP
[654405.502968] Modules linked in: xt_multiport tcp_diag inet_diag act_police cls_basic sch_ingress scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_pkttype xt_state veth openvswitch xt_owner xt_conntrack iptable_filter iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ioatdma dca ipmi_devintf ipmi_si ipmi_msghandler mpt2sas scsi_transport_sas raid_class
[654405.503021] CPU: 14 PID: 1342 Comm: gather_daemon.p Tainted: G D L 4.1.6-clouder1 #1
[654405.503024] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.0 07/09/2013
[654405.503026] task: ffff883dc1e170c0 ti: ffff881df4f80000 task.ti: ffff881df4f80000
[654405.503027] RIP: 0010:[<ffffffff811824e5>] [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.503031] RSP: 0018:ffff881df4f83a98 EFLAGS: 00010282
[654405.503033] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000001884e6d
[654405.503034] RDX: 00000000837ad864 RSI: 00000000000000d0 RDI: 0000000000018ce0
[654405.503035] RBP: ffff881df4f83ad8 R08: ffff88407fcd8ce0 R09: ffffffff811c272c
[654405.503037] R10: 0000000000000008 R11: 0000000000000001 R12: ffff881fff807ac0
[654405.503038] R13: 0000000000028001 R14: ffff881fff807ac0 R15: 00000000000000d0
[654405.503040] FS: 0000000000000000(0000) GS:ffff88407fcc0000(0063) knlGS:00000000558d2c00
[654405.503041] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[654405.503043] CR2: 0000000000028001 CR3: 0000001daa3cd000 CR4: 00000000000406e0
[654405.503044] Stack:
[654405.503046] ffff883a856c0402 0000000000000020 ffff881df4f83af8 ffff8825209b0b00
[654405.503049] ffffffff81212960 0000000000000000 ffffffff81212960 0000000000000000
[654405.503051] ffff881df4f83b18 ffffffff811c272c ffffffff81212960 0000000000000000
[654405.503054] Call Trace:
[654405.503063] [<ffffffff81212960>] ? get_iowait_time+0x70/0x70
[654405.503066] [<ffffffff81212960>] ? get_iowait_time+0x70/0x70
[654405.503070] [<ffffffff811c272c>] single_open+0x3c/0xb0
[654405.503073] [<ffffffff81212960>] ? get_iowait_time+0x70/0x70
[654405.503075] [<ffffffff81212960>] ? get_iowait_time+0x70/0x70
[654405.503077] [<ffffffff811c27f0>] single_open_size+0x50/0x90
[654405.503080] [<ffffffff811c1d20>] ? seq_release_private+0x60/0x60
[654405.503082] [<ffffffff8121286a>] stat_open+0x4a/0x60
[654405.503085] [<ffffffff81209574>] proc_reg_open+0x84/0x120
[654405.503088] [<ffffffff812094f0>] ? proc_entry_rundown+0xa0/0xa0
[654405.503091] [<ffffffff8119b69a>] do_dentry_open+0x22a/0x350
[654405.503093] [<ffffffff8119b809>] vfs_open+0x49/0x50
[654405.503097] [<ffffffff811ae652>] do_last+0x412/0x890
[654405.503102] [<ffffffff8100c299>] ? sched_clock+0x9/0x10
[654405.503107] [<ffffffff81084a7b>] ? sched_clock_cpu+0xab/0xc0
[654405.503110] [<ffffffff81182e4e>] ? kmem_cache_alloc+0xee/0x1c0
[654405.503115] [<ffffffff8129d6b6>] ? security_file_alloc+0x16/0x20
[654405.503118] [<ffffffff811aeb62>] path_openat+0x92/0x470
[654405.503122] [<ffffffff8108ff1f>] ? put_prev_task_fair+0x2f/0x50
[654405.503126] [<ffffffff810b2931>] ? lock_hrtimer_base+0x31/0x60
[654405.503128] [<ffffffff811aef8a>] do_filp_open+0x4a/0xa0
[654405.503132] [<ffffffff812fb140>] ? find_next_zero_bit+0x10/0x20
[654405.503136] [<ffffffff811bb64c>] ? __alloc_fd+0xac/0x150
[654405.503140] [<ffffffff8119ce9a>] do_sys_open+0x11a/0x230
[654405.503145] [<ffffffff810b9b2e>] ? getnstimeofday64+0xe/0x30
[654405.503150] [<ffffffff811274a3>] ? context_tracking_user_enter+0x13/0x20
[654405.503154] [<ffffffff811ee4cb>] compat_SyS_open+0x1b/0x20
[654405.503160] [<ffffffff815b2fc5>] sysenter_dispatch+0x7/0x25
[654405.503162] Code: 08 65 4c 03 05 5d 7c e8 7e 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 8c 00 00 00 48 85 c0 0f 84 83 00 00 00 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
[654405.503191] RIP [<ffffffff811824e5>] kmem_cache_alloc_trace+0x75/0x1d0
[654405.503194] RSP <ffff881df4f83a98>
[654405.503195] CR2: 0000000000028001
I have more but like these but I believe those are enough. The
following things arise as a pattern in those failures:
1. All these failures are happening when allocating 32 bytes struct,
this leads me to believe that the corruption has happened in the
kmalloc-32 slab cache.
2. Another thing which also stands out is the faulting address:
The value 0000000000028001 can predominantly be seen. In the case
when the panic has occured here is what the docded code shows:
Code: 8b 00 48 c1 e8 38 41 39 c6 74 17 4c 89 c9 44 89 f2 8b 75 cc 4c 89 e7 e8 46 f6 ff ff 49 89 c5 eb 2b 90 49 63 44 24 20 49 8b 3c 24 <49> 8b 5c 05 00 48 8d 4a 01 4c 89 e8 65 48 0f c7 0f 0f 94 c0 3c
Code starting with the faulting instruction
===========================================
0: 49 8b 5c 05 00 mov 0x0(%r13,%rax,1),%rbx
5: 48 8d 4a 01 lea 0x1(%rdx),%rcx
9: 4c 89 e8 mov %r13,%rax
c: 65 48 0f c7 0f cmpxchg16b %gs:(%rdi)
11: 0f 94 c0 sete %al
14: 3c .byte 0x3c
r13 takes part in the calculation of the address rbx has to be stored,
r13 = 0000000000028001
Any ideas how to debug this? The first thing that comes to mind, is
to boot the machine with slab merging disabled, in the hopes
that this would reduce the scope of the memory corruption and
the next time this occurs it would be easier to identify the culprit.
Here are the config options for the allocator in use:
grep -i slub kernel-conf-4.1
# CONFIG_SLUB_DEBUG is not set
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SLUB_STATS is not set
If more information is needed I'm happy to provide it.
Any help will be much appreciated.
Regards,
Nikolay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/