Re: recent -git: BUG in free_thread_xstate

From: Max Krasnyansky
Date: Wed Jul 23 2008 - 18:42:20 EST


Vegard Nossum wrote:
On Wed, Jul 23, 2008 at 10:31 PM, Suresh Siddha
<suresh.b.siddha@xxxxxxxxx> wrote:
On Wed, Jul 23, 2008 at 01:07:04PM -0700, Vegard Nossum wrote:
Hi,

I just got this on c010b2f76c3032e48097a6eef291d8593d5d79a6 (-git from
yesterday):
Do you see this in 2.6.26 aswell? I suspect it is coming from post 2.6.26
changes.

Yep. Got this on 2.6.26 now:

BUG: unable to handle kernel paging request at 00664381
IP: [<c010b884>] free_thread_xstate+0x4/0x30
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3796, comm: bash Not tainted (2.6.26 #1)
EIP: 0060:[<c010b884>] EFLAGS: 00210246 CPU: 0
EIP is at free_thread_xstate+0x4/0x30
EAX: 00664001 EBX: f3870000 ECX: 00000004 EDX: f4b544e8
ESI: f4bdef28 EDI: c07feda0 EBP: f5325bd0 ESP: f5325bcc
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process bash (pid: 3796, ti=f5324000 task=f4b53fc0 task.ti=f5324000)
Stack: f3870000 f5325bdc c010b8bd f4bddfa0 f5325be8 c0132b89 f4bddfa0 f5325bf4
c0133fd1 f4b77e00 f5325bfc c01368a7 f5325c14 c0172b8c 00200282 c0752b40
00000001 00000009 f5325c30 c0139cd3 c0803d00 c0803d00 c0803d00 00200046
Call Trace:
[<c010b8bd>] ? free_thread_info+0xd/0x20
[<c0132b89>] ? free_task+0x19/0x30
[<c0133fd1>] ? __put_task_struct+0x51/0xa0
[<c01368a7>] ? delayed_put_task_struct+0x27/0x30
[<c0172b8c>] ? rcu_process_callbacks+0x6c/0xb0
[<c0139cd3>] ? __do_softirq+0x83/0x100
[<c0139df5>] ? do_softirq+0xa5/0xb0
[<c0139f95>] ? irq_exit+0x95/0xa0
[<c0107e4d>] ? do_IRQ+0x4d/0xa0
[<c01057b2>] ? common_interrupt+0x2e/0x34
[<c013549e>] ? vprintk+0x1be/0x420
[<c010aea5>] ? native_sched_clock+0xb5/0x110
[<c010aea5>] ? native_sched_clock+0xb5/0x110
[<c013571b>] ? printk+0x1b/0x20
[<c012cbec>] ? cpu_attach_domain+0x3ec/0x410
[<c010aea5>] ? native_sched_clock+0xb5/0x110
[<c01979e1>] ? check_bytes_and_report+0x21/0xc0
[<c0197d8f>] ? check_object+0xdf/0x1f0
[<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
[<c0157895>] ? mark_held_locks+0x65/0x80
[<c0199055>] ? kfree+0xb5/0x120
[<c0157a24>] ? trace_hardirqs_on+0xd4/0x160
[<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
[<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
[<c0126c17>] ? sd_free_ctl_entry+0x37/0x50
[<c012cc3e>] ? detach_destroy_domains+0x2e/0x50
[<c012cc9b>] ? update_sched_domains+0x3b/0x50
[<c014d467>] ? notifier_call_chain+0x37/0x70
[<c014d4d9>] ? __raw_notifier_call_chain+0x19/0x20
[<c055c858>] ? _cpu_down+0x78/0x240
[<c015d92f>] ? cpu_maps_update_begin+0xf/0x20
[<c055ca4b>] ? cpu_down+0x2b/0x40
[<c055dc69>] ? store_online+0x39/0x80
[<c055dc30>] ? store_online+0x0/0x80
[<c02faf6b>] ? sysdev_store+0x2b/0x40
[<c01dcdd2>] ? sysfs_write_file+0xa2/0x100
[<c019eb76>] ? vfs_write+0x96/0x130
[<c01dcd30>] ? sysfs_write_file+0x0/0x100
[<c019f23d>] ? sys_write+0x3d/0x70
[<c0104cdb>] ? sysenter_past_esp+0x78/0xd1
=======================
Code: 04 00 00 00 00 c7 04 24 00 00 04 00 e8 96 f8 08 00 a3 b4 a5 80
c0 c9 c3 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 53 <8b>
90 80 03 00 00 89 c3 85 d2 74 14 a1 b4 a5 80 c0 e8 d6 e4 08
EIP: [<c010b884>] free_thread_xstate+0x4/0x30 SS:ESP 0068:f5325bcc
Kernel panic - not syncing: Fatal exception in interrupt

I'm not sure what to make of this. It looks related to the rebuilding
of sched domains that we saw earlier. But this reproduces on both
v2.6.26 and latest -git (though not with that backtrace).

Based on the trace above it seems that we panic even before calling into cpusets. (ie I do not see rebuild_sched_domains() in there). Which means it must be something different. The problem we had before was that cpusets where screwing up domain rebuild sequence during cpu hotplug handling.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/