Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Andy Lutomirski
Date: Wed Mar 18 2015 - 17:21:53 EST


On Wed, Mar 18, 2015 at 2:12 PM, Stefan Seyfried
<stefan.seyfried@xxxxxxxxxxxxxx> wrote:
> Am 18.03.2015 um 21:51 schrieb Andy Lutomirski:
>> On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
>> <stefan.seyfried@xxxxxxxxxxxxxx> wrote:
>
>>>> The relevant thread's stack is here (see ti in the trace):
>>>>
>>>> ffff8801013d4000
>>>>
>>>> It could be interesting to see what's there.
>>>>
>>>> I don't suppose you want to try to walk the paging structures to see
>>>> if ffff88023bc80000 (i.e. gsbase) and, more specifically,
>>>> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are
>>>> present? You'd only have to walk one level -- presumably, if the PGD
>>>> entry is there, the rest of the entries are okay, too.
>>>
>>> That's all greek to me :-)
>>>
>>> I see that there is something at ffff88023bc80000:
>>>
>>> crash> x /64xg 0xffff88023bc80000
>>> 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000
>>> 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000
>>> 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9
>>> 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000
>>> 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000
>>> [all zeroes]
>>> 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000
>>>
>>> old_rsp and kernel_stack seem bogus:
>>> crash> print old_rsp
>>> Cannot access memory at address 0xa200
>>> gdb: gdb request failed: print old_rsp
>>> crash> print kernel_stack
>>> Cannot access memory at address 0xaa48
>>> gdb: gdb request failed: print kernel_stack
>>>
>>> kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is:
>>
>> Yup. old_rsp and kernel_stack are offsets relative to gsbase.
>>
>>>
>>> crash> x /64xg 0xffff88023bc8aa00
>>> 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000
>>
>> [...]
>>
>> I don't know enough about crashkernel to know whether the fact that
>> this worked means anything.
>
> AFAIK this just means that the memory at this location is included in
> the dump :-)
>
>> Can you dump the page of physical memory at 0x4779a067? That's the PGD.
>
> Unfortunately not, this is a partial dump (I think the default config in
> openSUSE, but I might have changed it some time ago) and the dump_level
> is 31 which means that the following are excluded:
>
> | |cache |cache | |
> dump | zero |without|with | user | free
> level | page |private|private| data | page
> -------+------+-------+-------+------+------
> 31 | X | X | X | X | X
>
> so this:
> crash> x /64xg 0x4779a067
> 0x4779a067: Cannot access memory at address 0x4779a067
> gdb: gdb request failed: x /64xg
>
> probably just means, that the PGD falls in one of the above excluded
> categories.

I suspect that it actually means that gdb sees virtual addresses, not
physical addresses. But I screwed up completely -- "PGD" in the dump
is the PGD *entry*, not the PGD pointer.

We could plausibly fish it out from current->mm, but that's a mess. I
don't suppose that "info registers" or "p/x $cr3" will show the cr3
value?

In any case, Denys is right -- my theory doesn't really hold water on
non-SMAP systems.

--Andy

>
> Best regards,
>
> Stefan
> --
> Stefan Seyfried
> Linux Consultant & Developer -- GPG Key: 0x731B665B
>
> B1 Systems GmbH
> OsterfeldstraÃe 7 / 85088 Vohburg / http://www.b1-systems.de
> GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/