Re: [git pull] scheduler updates for v2.6.24
From: Gabriel C
Date: Tue Oct 16 2007 - 18:17:50 EST
Ingo Molnar wrote:
> * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>> On Mon, 15 Oct 2007 16:17:23 +0200
>> Ingo Molnar <mingo@xxxxxxx> wrote:
>>
>>> Linus, please pull the latest scheduler git tree from:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git
>> Did Paul Jackson's crash get fixed?
>
> yes - that crash was a showstopper that was holding up the pull request
> for 2 days. Paul bisected it down to the culprit and the fix was to do
> this in wake_up_new_task():
>
> - if (!p->sched_class->task_new || !current->se.on_rq) {
> + if (!p->sched_class->task_new || !current->se.on_rq || !rq->cfs.curr) {
>
> (during early bootup the cfs_rq has no curr pointer yet.) It's not clear
> why this race did not trigger earlier. (and the two checks can probably
> be consolidated into a single "!rq->cfs.curr" condition.)
Maybe not related to that but now my box is killed after this merge.
When I do not much on the box I get maybe 6h uptime , by doing some work ( compiling etc ) is random freeze.
I was able to capture the OOps finally :
...
[15692.917111] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000044
[15692.917159] printing eip:
[15692.917174] c0111f90
[15692.917185] *pde = 00000000
[15692.917200] Oops: 0000 [#1]
[15692.917208] PREEMPT SMP
[15692.917240] Modules linked in: fuse netconsole configfs pc87360 hwmon_vid eeprom adm1021 uhci_hcd sr_mod shpchp pci_hotplug ohci_hcd iTCO_wdt iTCO_vendor_support intel_agp i82860_edac i2c_i801 ehci_hcd usbcore edac_core cdrom agpgart 3c59x mii ext4dev jbd2 capability commoncap loop lp parport_pc parport evdev
[15692.917623] CPU: 0
[15692.917625] EIP: 0060:[<c0111f90>] Not tainted VLI
[15692.917629] EFLAGS: 00010046 (2.6.23-g65a6ec0d #330)
[15692.917661] EIP is at pick_next_task_fair+0x1f/0x2d
[15692.917672] eax: c150a7b8 ebx: 00000000 ecx: 00000000 edx: 00000000
[15692.917689] esi: c1507a48 edi: 00000000 ebp: 00eaaf7a esp: cb1fdf14
[15692.917701] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
[15692.917715] Process sed (pid: 28999, ti=cb1fc000 task=cfdc3500 task.ti=cb1fc000)
[15692.917725] Stack: c02f8268 c02ef7b5 00000002 cb1fdf58 cb1fdf50 00000000 c0400f38 c0403780
[15692.917833] cfdc3500 cfdc3634 c150a780 00000000 c011a8e7 00000000 c1077aa0 000000ff
[15692.917942] 00000000 00000000 00000000 cb1fdf8c 00000010 cfdc3500 cb1fdf8c c011ace5
[15692.918048] Call Trace:
[15692.918072] [<c02ef7b5>] schedule+0x321/0x58f
[15692.918109] [<c011a8e7>] do_exit+0x293/0x6c6
[15692.918143] [<c011ace5>] do_exit+0x691/0x6c6
[15692.918169] [<c011ad87>] sys_exit_group+0x0/0xd
[15692.918195] [<c01026e6>] sysenter_past_esp+0x5f/0x85
[15692.918232] =======================
[15692.918244] Code: 8b 53 28 89 43 34 89 53 38 5b 5e c3 53 31 d2 83 78 40 00 74 20 83 c0 38 8b 50 20 31 db 85 d2 74 0a 8d 5a f8 89 da e8 a9 ff ff ff <8b> 43 44 85 c0 75 e6 8d 53 d0 89 d0 5b c3 57 56 53 89 c6 89 d7
[15692.918981] EIP: [<c0111f90>] pick_next_task_fair+0x1f/0x2d SS:ESP 0068:cb1fdf14
...
After that the box is death need to hard reset it.
Interesting thing is when I compile the kernel with debug I don't get that ( or maybe its need longer to triggers it ? )
Config , lspci , dmesg , hardware specs , Oops message , and the top output when it Oops'ed there :
http://194.231.229.228/lara/
>
> Ingo
Regards,
Gabriel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/