Re: Early boot hang on recent 2.6 kernels (> 2.6.3), on x86-64 with 16gb of RAM

From: Robin Lee Powell
Date: Mon Sep 18 2006 - 19:59:24 EST


On Mon, Sep 18, 2006 at 09:50:41AM +0200, Andi Kleen wrote:
> Robin Lee Powell <rlpowell@xxxxxxxxxxxxxxxxxx> writes:
> >
> > This version is rather different, as it ends in:
> >
> > HARDWARE ERROR
> > CPU 0: Machine Check Exception: 7 Bank 3: b40000000000083b
> > RIP 10:<ffffffff80446e3e> {pci_conf1_read+0xbe/0xf0}
> > TSC 2e7932dbf8 ADDR fdfc000cfc
> > This is not a software problem!
> > Run through mcelog --ascii to decode and contact your hardware vendor
> > Kernel panic - not syncing: Uncorrected machine check
>
> Decoded it gives
>
> ..
> bus error 'local node origin, request didn't time out
> data read mem transaction
> i/o access, level generic'
> ..
>
> It will probably boot with mce=off acpi=off pci=conf1
>
> You got some buggy device that causes a bus timeout when its config space
> is read. The old kernel most likely didn't touch it by luck.
>
> Please add the following patch and send the whole log.
> This will tell us which device has this problem.

Done; it's at
http://teddyb.org/~rlpowell/media/regular/lkml/hacked-boot.txt

Note that I had to us "mce=off acpi=off pci=conf1" to get any of
that hack's output to show up at all; I wasn't clear whether you
intended that or not.

-Robin

--
http://www.digitalkingdom.org/~rlpowell/ *** http://www.lojban.org/
Reason #237 To Learn Lojban: "Homonyms: Their Grate!"
Proud Supporter of the Singularity Institute - http://singinst.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/