Re: HPET regression in 2.6.26 versus 2.6.25 -- connection betweenHPET and lockups found

From: Ingo Molnar
Date: Mon Aug 18 2008 - 21:15:33 EST



* David Witbrodt <dawitbro@xxxxxxxxxxxxx> wrote:

> The output I get when the kernel locks up looks perfectly OK, except
> maybe for the address of hpet_res (which I am not knowledgeable enough
> to judge):
>
> Data from arch/x86/kernel/acpi/boot.c:
> hpet_res = ffff88000100f000 broken_bios: 0
> sequence = 0 insert_resource() returned: 0
>
>
> I see some recent (Aug. 2008) discussion of alloc_bootmem() being
> broken, so maybe that is related to my problem.
>
> Does this connection between HPET and insert_resource() look
> meaningful, or is this a coincidence?

it is definitely the angle i'd suspect the most.

perhaps we stomp over some piece of memory that is "available RAM"
according to your BIOS, but in reality is used by something. With
previous kernels we got lucky and have put a data structure there which
kept your hpet still working. (a bit far-fetched i think, but the best
theory i could come up with)

the address you printed out (0xffff88000100f000), does look _somewhat_
suspicious. It corresponds to the physical address of 0x100f000. That is
_just_ above the 16MB boundary. It should not be relevant normally - but
it's still somewhat suspicious.

To test this theory, could you tweak this:

alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

to be:

alloc_bootmem_low(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE);

this will allocate the hpet resource descriptor in lower RAM.

Another idea: could you increase HPET_RESOURCE_NAME_SIZE from 9 to
something larger (via the patch below)? Maybe the bug is that this
overflows:

snprintf((char *)hpet_res->name, HPET_RESOURCE_NAME_SIZE, "HPET %u",
hpet_tbl->sequence);

and corrupts the memory next to the hpet resource descriptor. Depending
on random details of the kernel, this might or might not turn into some
real problem. The way of allocating the resource and its name string
together in a bootmem allocation is a bit quirky - but should be Ok
otherwise.

Hm, i see you have printed out hpet_tbl->sequence, and that gives 0,
which should be borderline OK in terms of overflow. Cannot hurt to add
this patch to your queue of test-patches :-/

Also, you could try to increase the bootmem allocation drastically, by
say 16*1024 bytes, via:

hpet_res = alloc_bootmem(sizeof(*hpet_res) + HPET_RESOURCE_NAME_SIZE + 16*1024);
hpet_res = (void *)hpet_res + 15*1024;

this will pad the memory at ~16MB and not use it for any resource.
Arguably a really weird hack, but i'm running out of ideas ...

Ingo

------------------>