Re: [bisected] 2.6.31 regression: fails to boot as xen guest

From: Jeremy Fitzhardinge
Date: Tue Aug 25 2009 - 15:30:51 EST


On 08/25/09 11:38, Arnd Hannemann wrote:
> Jeremy Fitzhardinge wrote:
>
>> On 08/25/09 11:25, Ingo Molnar wrote:
>>
>>> * Arnd Hannemann <hannemann@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>>
>>>> Pekka Enberg wrote:
>>>>
>>>>
>>>>> On Tue, 2009-08-25 at 19:49 +0200, Arnd Hannemann wrote:
>>>>>
>>>>>
>>>>>> Hi Pekka,
>>>>>>
>>>>>> Pekka Enberg wrote:
>>>>>>
>>>>>>
>>>>>>> On Tue, 2009-08-25 at 18:49 +0200, Arnd Hannemann wrote:
>>>>>>>
>>>>>>>
>>>>>>>>> Thanks for doing the bisect! Can we also see your
>>>>>>>>> .config also?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Config for -rc7 is attached. My bisect configs were based
>>>>>>>> on that
>>>>>>>>
>>>>>>>>
>>>>>>> Thanks! While we wait for the Xen people, you can try the
>>>>>>> following patch to see if we can narrow the bug down to
>>>>>>> trap_init().
>>>>>>>
>>>>>>>
>>>>>> Yes seems to be trap_init(). -rc7 with this patch applied
>>>>>> boots up to the prompt.
>>>>>>
>>>>>>
>>>>> Thanks for testing! Ingo, what do you think of the following
>>>>> patch? AFAICT, x86-32 is the only architecture playing with
>>>>> traps in mem_init() so this should be the safest fix for
>>>>> 2.6.31.
>>>>>
>>>>>
>>>> Hmm, -rc7 + this fix does not work for me :-/ Still hangs before
>>>> any output...
>>>>
>>>>
>>> does earlyprintk=vga tell you anything about precisely where it
>>> hangs?
>>>
>>>
>> It's a Xen domain, so it should be earlyprintk=xen
>>
>> J
>>
>>
> Here is the output with earlyprintk=xen and the second patch from pekka
> applied:
>
> (early) [ 0.000000] Initializing CPU#0
> (early) [ 0.000000] Checking if this processor honours the WP bit
> even in supervisor mode...(early)
> (early) [ 0.000000] BUG: unable to handle kernel (early) NULL pointer
> dereference(early) at (null)
> (early) [ 0.000000] IP:(early) [<c1192993>]
> xen_evtchn_do_upcall+0xd3/0x160
>

OK, that's the same problem I've seen; its trying to enable and delver
interrupts before init_IRQ has been called, so the various allocated
arrays aren't set up.

Ingo, I'm assuming that interrupts aren't supposed to be enabled this early?

Thanks,
J

> (early) [ 0.000000] *pdpt = 0000000008386001 (early)
> (early) [ 0.000000] Thread overran stack, or stack corrupted
> (early) [ 0.000000] Oops: 0000 [#1] (early) SMP (early)
> (early) [ 0.000000] last sysfs file:
> (early) [ 0.000000] Modules linked in:(early)
> (early) [ 0.000000]
> (early) [ 0.000000] Pid: 0, comm: swapper Not tainted
> (2.6.31-rc7-pae-um #10)
> (early) [ 0.000000] EIP: 0061:[<c1192993>] EFLAGS: 00010046 CPU: 0
> (early) [ 0.000000] EIP is at xen_evtchn_do_upcall+0xd3/0x160
> (early) [ 0.000000] EAX: 00000004 EBX: 00000000 ECX: 00000004 EDX:
> ffffffff
> (early) [ 0.000000] ESI: fffffffe EDI: 00000000 EBP: 00000000 ESP:
> c1413e64
> (early) [ 0.000000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: e021
> (early) [ 0.000000] Process swapper (pid: 0, ti=c1412000
> task=c13d11a0 task.ti=c1412000)
> (early) [ 0.000000] Stack:
> (early) [ 0.000000] f5793000(early) c146d9f0(early)
> c146d9f0(early) 00000000(early) c1413e9c(early) 00000000(early)
> 00000000(early) c3a01020(early)
> (early) [ 0.000000] <0>(early) 00000000(early) eec06067(early)
> c000cff8(early) 00000000(early) 00000000(early) c10086d7(early)
> eec06067(early) c000cff8(early)
> (early) [ 0.000000] <0>(early) f55ff000(early) c000cff8(early)
> 00000000(early) 00000000(early) c13d7d60(early) c101e021(early)
> c141e021(early) c10100d8(early)
> (early) [ 0.000000] Call Trace:
> (early) [ 0.000000] [<c10086d7>] ? xen_do_upcall+0x7/0xc
> (early) [ 0.000000] [<c101e021>] ? ptep_set_access_flags+0x1/0x80
> (early) [ 0.000000] [<c141e021>] ? find_e820_area_size+0x51/0x330
> (early)
>
>
> Best regards,
> Arnd
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/