Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Stefan Seyfried
Date: Wed Mar 18 2015 - 16:06:10 EST


Hi Andy,

Am 18.03.2015 um 20:26 schrieb Andy Lutomirski:
> Hi Linus-
>
> You seem to enjoy debugging these things. Want to give this a shot?
> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
> right after swapgs in syscall entry.
>
> On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried
> <stefan.seyfried@xxxxxxxxxxxxxx> wrote:
>> Hi all,
>>
>> first, I'm kind of happy that I'm not the only one seeing this, and
>> thus my beloved Thinkpad can stay for a bit longer... :-)
>>
>> Then, I'm mostly an amateur when it comes to kernel debugging, so bear
>> with me when I'm stumbling through the code...
>>
>> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
>>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
>>>> At Wed, 18 Mar 2015 18:43:52 +0100,
>>>> Takashi Iwai wrote:
>>>>>
>>>>> At Wed, 18 Mar 2015 15:16:42 +0100,
>>>>> Takashi Iwai wrote:
>>>>>>
>>>>>> At Sun, 15 Mar 2015 09:17:15 +0100,
>>>>>> Stefan Seyfried wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> in 4.0-rc I have recently seen a few crashes, always when running
>>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this
>>>>>>> is the backtrace from dmesg.txt:
>>>>>>>
>>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0
>>>
>>> OK, we double faulted. Too bad that x86 CPUs don't tell us why.
>>>
>>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
>>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
>>>>>>> [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
>>>>>>> [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30
>>>
>>> The double fault happened during page fault processing. Could you
>>> disassemble your page_fault function to find the offending
>>> instruction?
>>
>> This one is easy:
>>
>> crash> disassemble page_fault
>> Dump of assembler code for function page_fault:
>> 0xffffffff816834a0 <+0>: data32 xchg %ax,%ax
>> 0xffffffff816834a3 <+3>: data32 xchg %ax,%ax
>> 0xffffffff816834a6 <+6>: data32 xchg %ax,%ax
>> 0xffffffff816834a9 <+9>: sub $0x78,%rsp
>> 0xffffffff816834ad <+13>: callq 0xffffffff81683620 <error_entry>
>
> The callq was the double-faulting instruction, and it is indeed the
> first function in here that would have accessed the stack. (The sub
> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we
> page fault, and the page fault is promoted to a double fault. The
> surprising thing is that the page fault itself seems to have been
> delivered okay, and RSP wasn't on a page boundary.
>
> You wouldn't happen to be using a Broadwell machine?

No, this is a quite old Thinkpad X200s, Core2duo
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz
stepping : 10
microcode : 0xa0c

> The only way to get here with bogus RSP is if we interrupted something
> that was previously running at CPL0 with similarly bogus RSP.
>
> I don't know if I trust CR2. It's 16 bytes lower than I'd expect.
>
>> 0xffffffff816834b2 <+18>: mov %rsp,%rdi
>> 0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi
>> 0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp)
>> 0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 <do_page_fault>
>> 0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 <error_exit>
>> End of assembler dump.
>>
>>
>>>>>>> [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016
>>>
>>> Uh, what? That RSP is a user address.
>>>
>>>>>>> [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237
>>>>>>> [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078
>>>>>>> [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000
>>>>>>> [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a
>>>>>>> [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20
>>>>>>> [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
>>>>>>> [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0
>>>>>>> [242060.604909] Stack:
>>>>>>> [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8
>>>>>>> [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>
>>> This is suspicious. We need to have died, again, of a fatal page
>>> fault while dumping the stack.
>>
>> I posted the same problem to the opensuse kernel list shortly before turning
>> to LKML. There, Michal Kubecek noted:
>>
>> "I encountered a similar problem recently. The thing is, x86
>> specification says that on a double fault, RIP and RSP registers are
>> undefined, i.e. you not only can't expect them to contain values
>> corresponding to the first or second fault but you can't even expect
>> them to have any usable values at all. Unfortunately the kernel double
>> fault handler doesn't take this into account and does try to display
>> usual crash related information so that it itself does usually crash
>> when trying to show stack content (that's the show_stack_log_lvl()
>> crash).
>
> I think that's not entirely true. RIP is reliable for many classes of
> double faults, and we rely on that for espfix64. The fact that hpa
> was willing to write that code strongly suggests that Intel chips at
> least really do work that way.
>
>>
>> The result is a double fault (which itself would be very hard to debug)
>> followed by a crash in its handler so that analysing the outcome is
>> extremely difficult."
>>
>> I cannot judge if this is true, but it sounded related to solving the
>> problem to me.
>
>
> The crash in the handler is a separate bug.
>
>
>
>>
>>>>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0
>>>>>>> [242060.605078] Oops: 0000 [#1] PREEMPT SMP
>>>>>>> [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm
>>>>>>> [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop
>>>>>>> [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
>>>>>>> [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
>>>>>>> [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
>>>>>>> [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>>>>> [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046
>>>>>>> [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0
>>>>>>> [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000
>>>>>>> [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020
>>>>>>> [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58
>>>>>>> [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000
>>>>>>> [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
>>>>>>> [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0
>>>>>>> [242060.605396] Stack:
>>>>>>> [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8
>>>>>>> [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040
>>>>>>> [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36
>>>>>>> [242060.605396] Call Trace:
>>>>>>> [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210
>>>>>>> [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30
>>>>>>> [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100
>>>>>>> [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30
>>>>>>> [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30
>>>>>>> [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89
>>>>>>> [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>>>>> [242060.605396] RSP <ffff88023bc84e88>
>>>>>>> [242060.605396] CR2: 00007fffa55eafb8
>>>>>>>
>>>>>>> I would not totally rule out a hardware problem, since this machine had
>>>>>>> another weird crash where it crashed and the bios beeper was constant
>>>>>>> on until I hit the power button for 5 seconds.
>>>>>>>
>>>>>>> Unfortunately, I cannot load the crashdump with the crash version in
>>>>>>> openSUSE Tumbleweed, so the backtrace is all I have for now.
>>>>>>
>>>>>> Just "me too", I'm getting the very same crash out of sudden with the
>>>>>> recent 4.0-rc. Judging from the very same pattern (usually crash
>>>>>> happened while using KVM (-smp 4) and kernel builds with -j8), I don't
>>>>>> think it's a hardware problem.
>>>>>
>>>>> The git bisection pointed to the commit:
>>>>> commit b926e6f61a26036ee9eabe6761483954d481ad25
>>>>> x86, traps: Fix ist_enter from userspace
>>>>>
>>>>> And reverting this on top of the latest Linus tree seems working.
>>>>> Seife, could you verify on your machine, too?
>>>>
>>>> Argh, false positive. Right after I wrote this mail, I got the very
>>>> same crash. I seem to need running the test much longer than I
>>>> thought.
>>>>
>>>> But somehow the commits around the above smell suspicious...
>>>>
>>>
>>> Those commits shouldn't really have affected page fault or double
>>> fault behavior. They made big changes to MCE, breakpoints, and debug
>>> exceptions.
>>>
>>> Something's very wrong here. I'm guessing that we somehow ended up in
>>> page_fault in a completely invalid context.
>>>
>>> One hairy code path that could plausibly do this is:
>>>
>>> 1. syscall
>>>
>>> 2. vmalloc fault accessing old_rsp aka rsp_scratch (or kernel_stack --
>>> same issue)
>>>
>>> 3. page fault. Now we're on the user stack and we do awful things.
>>> If we run off the end of the presently writable portion of the stack,
>>> we double fault.
>>
>> Maybe Michal's idea from above points in the right direction?
>>
>> Now since I have a crash dump and the corresponding debuginfo at hand,
>> might this help somehow to find out where the problem originated from?
>>
>> I mean -- it's only 508 processes to look at :-) but if I knew what to
>> look for in their backtraces, I would try to do it.
>
> The relevant thread's stack is here (see ti in the trace):
>
> ffff8801013d4000
>
> It could be interesting to see what's there.
>
> I don't suppose you want to try to walk the paging structures to see
> if ffff88023bc80000 (i.e. gsbase) and, more specifically,
> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are
> present? You'd only have to walk one level -- presumably, if the PGD
> entry is there, the rest of the entries are okay, too.

That's all greek to me :-)

I see that there is something at ffff88023bc80000:

crash> x /64xg 0xffff88023bc80000
0xffff88023bc80000: 0x0000000000000000 0x0000000000000000
0xffff88023bc80010: 0x0000000000000000 0x0000000000000000
0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9
0xffff88023bc80030: 0x0000000000000000 0x0000000000000000
0xffff88023bc80040: 0x0000000000000000 0x0000000000000000
0xffff88023bc80050: 0x0000000000000000 0x0000000000000000
0xffff88023bc80060: 0x0000000000000000 0x0000000000000000
0xffff88023bc80070: 0x0000000000000000 0x0000000000000000
0xffff88023bc80080: 0x0000000000000000 0x0000000000000000
0xffff88023bc80090: 0x0000000000000000 0x0000000000000000
0xffff88023bc800a0: 0x0000000000000000 0x0000000000000000
0xffff88023bc800b0: 0x0000000000000000 0x0000000000000000
0xffff88023bc800c0: 0x0000000000000000 0x0000000000000000
0xffff88023bc800d0: 0x0000000000000000 0x0000000000000000
0xffff88023bc800e0: 0x0000000000000000 0x0000000000000000
0xffff88023bc800f0: 0x0000000000000000 0x0000000000000000
0xffff88023bc80100: 0x0000000000000000 0x0000000000000000
0xffff88023bc80110: 0x0000000000000000 0x0000000000000000
0xffff88023bc80120: 0x0000000000000000 0x0000000000000000
0xffff88023bc80130: 0x0000000000000000 0x0000000000000000
0xffff88023bc80140: 0x0000000000000000 0x0000000000000000
0xffff88023bc80150: 0x0000000000000000 0x0000000000000000
0xffff88023bc80160: 0x0000000000000000 0x0000000000000000
0xffff88023bc80170: 0x0000000000000000 0x0000000000000000
0xffff88023bc80180: 0x0000000000000000 0x0000000000000000
0xffff88023bc80190: 0x0000000000000000 0x0000000000000000
0xffff88023bc801a0: 0x0000000000000000 0x0000000000000000
0xffff88023bc801b0: 0x0000000000000000 0x0000000000000000
0xffff88023bc801c0: 0x0000000000000000 0x0000000000000000
0xffff88023bc801d0: 0x0000000000000000 0x0000000000000000
0xffff88023bc801e0: 0x0000000000000000 0x0000000000000000
0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000

old_rsp and kernel_stack seem bogus:
crash> print old_rsp
Cannot access memory at address 0xa200
gdb: gdb request failed: print old_rsp
crash> print kernel_stack
Cannot access memory at address 0xaa48
gdb: gdb request failed: print kernel_stack

kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is:

crash> x /64xg 0xffff88023bc8aa00
0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa10: 0x0000000000000000 0xffff880103f46150
0xffff88023bc8aa20: 0xffffffff80010001 0xffff88023bc83fc0
0xffff88023bc8aa30: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa40: 0xffff880103f46150 0xffff8801013d7fd8
0xffff88023bc8aa50: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa60: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa70: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa80: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aa90: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aaa0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aab0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aac0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aad0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aae0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aaf0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab00: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab10: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab20: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab30: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab40: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab50: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab60: 0x0000000000000000 0x000000007fffffff
0xffff88023bc8ab70: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab80: 0x0000000000000000 0x0000000000000000
0xffff88023bc8ab90: 0x0000000000000000 0x0000000000000000
0xffff88023bc8aba0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8abb0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8abc0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8abd0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8abe0: 0x0000000000000000 0x0000000000000000
0xffff88023bc8abf0: 0x0000000000000000 0x0000000000000000

and for old_rsp:
crash> x /64xg 0xffff88023bc8a200
0xffff88023bc8a200: 0x00007fffa55f1b98 0x0000000000000000
0xffff88023bc8a210: 0x0000000000000000 0x0000000000000000
0xffff88023bc8a220: 0x0000000000000000 0x0000000000000000
0xffff88023bc8a230: 0x0000000000003ce6 0x0000000000000000
0xffff88023bc8a240: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a250: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a260: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a270: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a280: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a290: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a2a0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a2b0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a2c0: 0xffffffffffffffff 0xffffffff0000001b
0xffff88023bc8a2d0: 0xffffffff00000023 0xffffffffffffffff
0xffff88023bc8a2e0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a2f0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a300: 0x0000000100000000 0x0000000300000002
0xffff88023bc8a310: 0x0000000500000004 0x0000000700000006
0xffff88023bc8a320: 0x0000000900000008 0x0000000b0000000a
0xffff88023bc8a330: 0x0000000d0000000c 0x0000000f0000000e
0xffff88023bc8a340: 0x00000014ffffffff 0xffffffffffffffff
0xffff88023bc8a350: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a360: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a370: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a380: 0x00000015ffffffff 0xffffffffffffffff
0xffff88023bc8a390: 0xffffffff0000001e 0xffffffffffffffff
0xffff88023bc8a3a0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a3b0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a3c0: 0x00000016ffffffff 0xffffffffffffffff
0xffff88023bc8a3d0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a3e0: 0xffffffffffffffff 0xffffffffffffffff
0xffff88023bc8a3f0: 0xffffffffffffffff 0xffffffffffffffff

But it's entirely possible I'm using the tool wrong.

What type of data is supposed to be at those places? I tried
crash> print *(pgtable_t) 0xffff88023bc8a200
but I cannot judge if the output is useful :-(

Best regards,

Stefan
--
Stefan Seyfried
Linux Consultant & Developer -- GPG Key: 0x731B665B

B1 Systems GmbH
OsterfeldstraÃe 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/