Debug hints for fpu state NULL pointer dereference on context switch during core dump in 3.0.101

From: Lennart Sorensen
Date: Mon Dec 19 2016 - 13:09:46 EST

Next message: George Spelvin: "RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF"
Previous message: Borislav Petkov: "Re: [PATCH] x86/microcode: Adjust ramdisk address when accessing by virtual address"
Next in thread: Lennart Sorensen: "Re: Debug hints for fpu state NULL pointer dereference on context switch during core dump in 3.0.101"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I am trying to debug a problem that has been happening occationally for
years on some of our systems running 3.0.101 kernel (yes I know it is
old, we are moving to 4.9 at the moment but I would like older releases
to be fixed too, assuming 4.9 makes this problem disappear).

What is happening is that once in a while a process does something wrong
and segfaults, and dumps core. We have a handler to process the core dump
to name it and compress it and make sure we don't keep to many around,
so the core_pattern uses the pipe option to pipe the dump to a shell
script that saves it with the pid and current timestamp and gzips it.

Once in a while when this happens, the kernel hits a null pointer
dereference in fpu.state->xsave while doing __switch_to.

The system ix x86_64 with dual E5-2620 CPUs (6 cores each with
hyperthreading). Some people think they have seen it on other systems,
but are not sure. I have not been able to trigger it on other systems
yet.

It used to take about a week of running tests to trigger it, but I have
now managed to hit it in a few minutes pretty reliably.

The way I trigger it is:

abuse.c:

#include <sys/types.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

int main() {
pid_t pid;

pid = getpid();

printf("%u: pid / PI = %f\n", pid, pid / M_PI);

kill(pid, SIGSEGV);
return 0;
}

I then run:
for i in `seq 1 1 10000`; do ./abuse & echo -n; done

That pretty reliably hits the problem.

The crash dump we get is:
Dec 19 12:24:03 HWC-64 kernel: BUG: unable to handle kernel NULL pointer dereference at 000000000000033f
Dec 19 12:24:03 HWC-64 kernel: IP: [<ffffffff81000bbc>] __switch_to+0x4c/0x2b0
Dec 19 12:24:03 HWC-64 kernel: PGD 39a6ce067 PUD 39a8cf067 PMD 0
Dec 19 12:24:03 HWC-64 kernel: Oops: 0002 [#1] SMP
Dec 19 12:24:03 HWC-64 kernel: CPU 12
Dec 19 12:24:03 HWC-64 kernel: Modules linked in: dpi_drv(P) ccu_util(P) ipv4_mb(P) l2bridge_config_util(P) l2_config_util(P) route_config_util(P) qos_config_util(P) sysapp_common(P) chantry_fwd_eng_2800_config(P) shim_module(P) sadb_cc(P) ipsecXformer(P) libeCrypto(P) ipmatch_cc(P) l2h_cc(P) ndproxy_cc(P) arpint_cc(P) portinfo_cc(P) chantryqos_cc(P) redirector_cc(P) ix_ph(P) fpm_core_cc(P) pulse_cc(P) vnstt_cc(P) vnsap_cc(P) fm_cc(P) rutm_cc(P) mutm_cc(P) ethernet_tx_cc(P) stkdrv_cc(P) l2bridge_cc(P) events_util(P) sched_cc(P) qm_cc(P) ipv4_cc(P) wred_cc(P) tc_meter_cc(P) dscp_classifier_cc(P) classifier_6t_cc(P) ent586_cc(P) dev_cc_arp(P) chantry_fwd_eng_2800_tables(P) ether_arp_lib(P) rtmv4_lib(P) lkup_lib(P) l2tm_lib(P) fragmentation_lib(P) properties_lib(P) msg_support_lib(P) utilities_lib(P) cci_lib(P) rm_lib(P) libossl vip productSpec_x86_dp(P) ixgbe igb
Dec 19 12:24:03 HWC-64 kernel:
Dec 19 12:24:03 HWC-64 kernel: Pid: 16440, comm: coremgr.sh Tainted: P 3.0.101 #6 Intel Corporation S2600GZ ........../S2600GZ
Dec 19 12:24:03 HWC-64 kernel: RIP: 0010:[<ffffffff81000bbc>] [<ffffffff81000bbc>] __switch_to+0x4c/0x2b0
Dec 19 12:24:03 HWC-64 kernel: RSP: 0018:ffff88042f2c9de8 EFLAGS: 00010002
Dec 19 12:24:03 HWC-64 kernel: RAX: 00000000ffffffff RBX: ffff8803d8266ae0 RCX: 000000000000000c
Dec 19 12:24:03 HWC-64 kernel: RDX: 00000000ffffffff RSI: ffff88042f29b840 RDI: 0000000000000000
Dec 19 12:24:03 HWC-64 kernel: RBP: ffff88043f38da40 R08: ffff8803d8266e08 R09: 0000000100018011
Dec 19 12:24:03 HWC-64 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000000
Dec 19 12:24:03 HWC-64 kernel: R13: ffff88042ce62080 R14: 000000000000000c R15: ffff88042f29b840
Dec 19 12:24:03 HWC-64 kernel: FS: 00007f72f71a5700(0000) GS:ffff88043f380000(0000) knlGS:0000000000000000
Dec 19 12:24:03 HWC-64 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 19 12:24:03 HWC-64 kernel: CR2: 000000000000033f CR3: 00000003d8c70000 CR4: 00000000000406e0
Dec 19 12:24:03 HWC-64 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 19 12:24:03 HWC-64 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 19 12:24:03 HWC-64 kernel: Process coremgr.sh (pid: 16440, threadinfo ffff88039a00c000, task ffff8803d8266ae0)
Dec 19 12:24:03 HWC-64 kernel: Stack:
Dec 19 12:24:03 HWC-64 kernel: 0000000000000018 ffff88043f38fe40 ffff88043f38fe40 ffff88042f29b840
Dec 19 12:24:03 HWC-64 kernel: 0000000000000000 ffffffff81301c94 ffff88042f29b840 0000000000000046
Dec 19 12:24:03 HWC-64 kernel: ffff88042f2c9fd8 ffff88042f2c9fd8 000000000000a000 0000000000000000
Dec 19 12:24:03 HWC-64 kernel: Call Trace:
Dec 19 12:24:03 HWC-64 kernel: Code: 48 63 c1 48 03 14 c5 40 a0 67 81 8b 87 b8 03 00 00 48 89 d5 85 c0 74 37 66 66 90 66 90 b8 ff ff ff ff 48 8b bf c0 03 00 00 89 c2
Dec 19 12:24:03 HWC-64 kernel: <48> 0f ae 37 48 8b 83 c0 03 00 00 f6 80 00 02 00 00 01 0f 84 88
Dec 19 12:24:03 HWC-64 kernel: RIP [<ffffffff81000bbc>] __switch_to+0x4c/0x2b0
Dec 19 12:24:03 HWC-64 kernel: RSP <ffff88042f2c9de8>
Dec 19 12:24:03 HWC-64 kernel: CR2: 000000000000033f
Dec 19 12:24:03 HWC-64 kernel: ---[ end trace 366732e1020fb678 ]---

That is calling fpu_xsave in fpu_save_init, since it dereferences
fpu->state->xsave which is somehow NULL even though the process apparently
is flagged as using the FPU. Of course the process was just created so
I am wondering if there is (was?) a race condition in fork that could
somehow be triggered when handling a core dump.

coremgr.sh is the script that is handling the core dump. Sometimes it
is the task listed, sometimes one of the things it calls is the process
listed in the dump. It is always something to do with the core dump
handling that is listed as the current process when this happens.

Given I can't just insert printk's in the context switch code (comments
clearly say not to do so), I am wondering what else I can try to get to
the bottom of how this odd state could happen.

Any suggestions where to poke next given at least I can now trigger it
pretty fast?

--
Len Sorensen

Next message: George Spelvin: "RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF"
Previous message: Borislav Petkov: "Re: [PATCH] x86/microcode: Adjust ramdisk address when accessing by virtual address"
Next in thread: Lennart Sorensen: "Re: Debug hints for fpu state NULL pointer dereference on context switch during core dump in 3.0.101"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]