Hard LOCKUP with 2.6.32.28 (maybe scheduler/tick related?)

From: Sebastian Färber
Date: Mon Jan 31 2011 - 06:06:11 EST


Hi,

i recently upgraded some servers from 2.6.32.9 to 2.6.32.28 and see
frequent "hard lockups" on
a few of them now. I've compiled a kernel with debugging support and
enabled the "NMI Watchdog"
to get more information.
I've attached my .config and the stack traces from the nmi watchdog,
captured via a serial console.
To me it looks like there is some problem in run_posix_cpu_timers and
the problem is also
triggering WARNING: at kernel/sched_fair.c:979 hrtick_start_fair.

Note that the kernel is patched with grsecurity and i'm running CONFIG_NO_HZ.
There were no problems with 2.6.32.9.
Would be great if someone could have a look at this, i can provide
more information if neccessary.

Regards,

Sebastian

---
Small excerpt from the crashlog, see attached "crash" for full details:

BUG: NMI Watchdog detected LOCKUP on CPU1, ip 003cd283, registers:
Modules linked in: coretemp i2c_dev i2c_i801 ipt_LOG xt_limit
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_NOTRACK iptable_raw
ipt_REJECT iptable_filter nf_conntrack_ftp nf_conntrack sky2 usbcore
ipv6
Pid: 28695, comm: php4-STABLE-STA Tainted: G D
(2.6.32.28-grsec-debug #2) 965GM-DS2
EIP: 0060:[<003cd283>] EFLAGS: 00200097 CPU: 1
EIP is at _spin_lock+0x13/0x20
EAX: f5938a84 EBX: f65491c0 ECX: 0000ebeb EDX: 0000cccb
ESI: f65493a0 EDI: 00000000 EBP: f495e954 ESP: f495e954
DS: 0068 ES: 0068 FS: 00d8 GS: 007b SS: 0068
Process php4 (pid: 28695, ti=f495e000 task=f65491c0 task.ti=f495e000)
Stack:
f495e9bc 000594d1 f495e97c 0002f975 00000001 f495e9a0 0000225d c1e851b8
<0> f65491c0 c1e851b8 f495e994 00031dde 00000000 c18226c0 00200086 00000120
<0> 00000139 7d3ab034 00000001 f495e9a0 f495e9a0 f65491c0 00000001 00000000
Call Trace:
[<000594d1>] ? run_posix_cpu_timers+0xc1/0x8c0
[<0002f975>] ? sched_slice+0x55/0xa0
[<0000225d>] ? do_one_initcall+0x15d/0x170
[<00031dde>] ? task_tick_fair+0xfe/0x140
[<00200086>] ? fb_get_mode+0x146/0x350
[<0004cb79>] ? update_process_times+0x49/0x60
[<00064f5f>] ? tick_periodic+0x2f/0x80
[<00064fc9>] ? tick_handle_periodic+0x19/0x90
[<0001d452>] ? smp_apic_timer_interrupt+0x52/0x90
[<000050f2>] ? apic_timer_interrupt+0x52/0x60
[<00008a16>] ? die_nmi+0xd6/0xe0
[<00200206>] ? fb_get_mode+0x2c6/0x350
[<003cd27d>] ? _spin_lock+0xd/0x20
[<00200046>] ? fb_get_mode+0x106/0x350
[<0001e13d>] ? nmi_watchdog_tick+0x12d/0x1b0

Attachment: config
Description: Binary data

Attachment: crash
Description: Binary data