Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Andy Lutomirski
Date: Wed Mar 18 2015 - 16:51:38 EST


On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried
<stefan.seyfried@xxxxxxxxxxxxxx> wrote:
> Hi Andy,
>
> Am 18.03.2015 um 20:26 schrieb Andy Lutomirski:
>> Hi Linus-
>>
>> You seem to enjoy debugging these things. Want to give this a shot?
>> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
>> right after swapgs in syscall entry.
>>
>> On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried
>> <stefan.seyfried@xxxxxxxxxxxxxx> wrote:
>>> Hi all,
>>>
>>> first, I'm kind of happy that I'm not the only one seeing this, and
>>> thus my beloved Thinkpad can stay for a bit longer... :-)
>>>
>>> Then, I'm mostly an amateur when it comes to kernel debugging, so bear
>>> with me when I'm stumbling through the code...
>>>
>>> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski:
>>>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai <tiwai@xxxxxxx> wrote:
>>>>> At Wed, 18 Mar 2015 18:43:52 +0100,
>>>>> Takashi Iwai wrote:
>>>>>>
>>>>>> At Wed, 18 Mar 2015 15:16:42 +0100,
>>>>>> Takashi Iwai wrote:
>>>>>>>
>>>>>>> At Sun, 15 Mar 2015 09:17:15 +0100,
>>>>>>> Stefan Seyfried wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> in 4.0-rc I have recently seen a few crashes, always when running
>>>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this
>>>>>>>> is the backtrace from dmesg.txt:
>>>>>>>>
>>>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0
>>>>
>>>> OK, we double faulted. Too bad that x86 CPUs don't tell us why.
>>>>
>>>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
>>>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
>>>>>>>> [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
>>>>>>>> [242060.604885] RIP: 0010:[<ffffffff816834ad>] [<ffffffff816834ad>] page_fault+0xd/0x30
>>>>
>>>> The double fault happened during page fault processing. Could you
>>>> disassemble your page_fault function to find the offending
>>>> instruction?
>>>
>>> This one is easy:
>>>
>>> crash> disassemble page_fault
>>> Dump of assembler code for function page_fault:
>>> 0xffffffff816834a0 <+0>: data32 xchg %ax,%ax
>>> 0xffffffff816834a3 <+3>: data32 xchg %ax,%ax
>>> 0xffffffff816834a6 <+6>: data32 xchg %ax,%ax
>>> 0xffffffff816834a9 <+9>: sub $0x78,%rsp
>>> 0xffffffff816834ad <+13>: callq 0xffffffff81683620 <error_entry>
>>
>> The callq was the double-faulting instruction, and it is indeed the
>> first function in here that would have accessed the stack. (The sub
>> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we
>> page fault, and the page fault is promoted to a double fault. The
>> surprising thing is that the page fault itself seems to have been
>> delivered okay, and RSP wasn't on a page boundary.
>>
>> You wouldn't happen to be using a Broadwell machine?
>
> No, this is a quite old Thinkpad X200s, Core2duo
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 6
> model : 23
> model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz
> stepping : 10
> microcode : 0xa0c
>
>> The only way to get here with bogus RSP is if we interrupted something
>> that was previously running at CPL0 with similarly bogus RSP.
>>
>> I don't know if I trust CR2. It's 16 bytes lower than I'd expect.
>>
>>> 0xffffffff816834b2 <+18>: mov %rsp,%rdi
>>> 0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi
>>> 0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp)
>>> 0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 <do_page_fault>
>>> 0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 <error_exit>
>>> End of assembler dump.
>>>
>>>
>>>>>>>> [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016
>>>>
>>>> Uh, what? That RSP is a user address.
>>>>
>>>>>>>> [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237
>>>>>>>> [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078
>>>>>>>> [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000
>>>>>>>> [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a
>>>>>>>> [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20
>>>>>>>> [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
>>>>>>>> [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0
>>>>>>>> [242060.604909] Stack:
>>>>>>>> [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8
>>>>>>>> [242060.604995] IP: [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>>
>>>> This is suspicious. We need to have died, again, of a fatal page
>>>> fault while dumping the stack.
>>>
>>> I posted the same problem to the opensuse kernel list shortly before turning
>>> to LKML. There, Michal Kubecek noted:
>>>
>>> "I encountered a similar problem recently. The thing is, x86
>>> specification says that on a double fault, RIP and RSP registers are
>>> undefined, i.e. you not only can't expect them to contain values
>>> corresponding to the first or second fault but you can't even expect
>>> them to have any usable values at all. Unfortunately the kernel double
>>> fault handler doesn't take this into account and does try to display
>>> usual crash related information so that it itself does usually crash
>>> when trying to show stack content (that's the show_stack_log_lvl()
>>> crash).
>>
>> I think that's not entirely true. RIP is reliable for many classes of
>> double faults, and we rely on that for espfix64. The fact that hpa
>> was willing to write that code strongly suggests that Intel chips at
>> least really do work that way.
>>
>>>
>>> The result is a double fault (which itself would be very hard to debug)
>>> followed by a crash in its handler so that analysing the outcome is
>>> extremely difficult."
>>>
>>> I cannot judge if this is true, but it sounded related to solving the
>>> problem to me.
>>
>>
>> The crash in the handler is a separate bug.
>>
>>
>>
>>>
>>>>>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0
>>>>>>>> [242060.605078] Oops: 0000 [#1] PREEMPT SMP
>>>>>>>> [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm
>>>>>>>> [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop
>>>>>>>> [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1
>>>>>>>> [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
>>>>>>>> [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000
>>>>>>>> [242060.605396] RIP: 0010:[<ffffffff81005b44>] [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>>>>>> [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046
>>>>>>>> [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0
>>>>>>>> [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000
>>>>>>>> [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020
>>>>>>>> [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58
>>>>>>>> [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000
>>>>>>>> [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000
>>>>>>>> [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0
>>>>>>>> [242060.605396] Stack:
>>>>>>>> [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8
>>>>>>>> [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040
>>>>>>>> [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36
>>>>>>>> [242060.605396] Call Trace:
>>>>>>>> [242060.605396] [<ffffffff81005c36>] show_regs+0x86/0x210
>>>>>>>> [242060.605396] [<ffffffff8104636f>] df_debug+0x1f/0x30
>>>>>>>> [242060.605396] [<ffffffff810041a4>] do_double_fault+0x84/0x100
>>>>>>>> [242060.605396] [<ffffffff81683088>] double_fault+0x28/0x30
>>>>>>>> [242060.605396] [<ffffffff816834ad>] page_fault+0xd/0x30
>>>>>>>> [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89
>>>>>>>> [242060.605396] RIP [<ffffffff81005b44>] show_stack_log_lvl+0x124/0x190
>>>>>>>> [242060.605396] RSP <ffff88023bc84e88>
>>>>>>>> [242060.605396] CR2: 00007fffa55eafb8
>>>>>>>>
>>>>>>>> I would not totally rule out a hardware problem, since this machine had
>>>>>>>> another weird crash where it crashed and the bios beeper was constant
>>>>>>>> on until I hit the power button for 5 seconds.
>>>>>>>>
>>>>>>>> Unfortunately, I cannot load the crashdump with the crash version in
>>>>>>>> openSUSE Tumbleweed, so the backtrace is all I have for now.
>>>>>>>
>>>>>>> Just "me too", I'm getting the very same crash out of sudden with the
>>>>>>> recent 4.0-rc. Judging from the very same pattern (usually crash
>>>>>>> happened while using KVM (-smp 4) and kernel builds with -j8), I don't
>>>>>>> think it's a hardware problem.
>>>>>>
>>>>>> The git bisection pointed to the commit:
>>>>>> commit b926e6f61a26036ee9eabe6761483954d481ad25
>>>>>> x86, traps: Fix ist_enter from userspace
>>>>>>
>>>>>> And reverting this on top of the latest Linus tree seems working.
>>>>>> Seife, could you verify on your machine, too?
>>>>>
>>>>> Argh, false positive. Right after I wrote this mail, I got the very
>>>>> same crash. I seem to need running the test much longer than I
>>>>> thought.
>>>>>
>>>>> But somehow the commits around the above smell suspicious...
>>>>>
>>>>
>>>> Those commits shouldn't really have affected page fault or double
>>>> fault behavior. They made big changes to MCE, breakpoints, and debug
>>>> exceptions.
>>>>
>>>> Something's very wrong here. I'm guessing that we somehow ended up in
>>>> page_fault in a completely invalid context.
>>>>
>>>> One hairy code path that could plausibly do this is:
>>>>
>>>> 1. syscall
>>>>
>>>> 2. vmalloc fault accessing old_rsp aka rsp_scratch (or kernel_stack --
>>>> same issue)
>>>>
>>>> 3. page fault. Now we're on the user stack and we do awful things.
>>>> If we run off the end of the presently writable portion of the stack,
>>>> we double fault.
>>>
>>> Maybe Michal's idea from above points in the right direction?
>>>
>>> Now since I have a crash dump and the corresponding debuginfo at hand,
>>> might this help somehow to find out where the problem originated from?
>>>
>>> I mean -- it's only 508 processes to look at :-) but if I knew what to
>>> look for in their backtraces, I would try to do it.
>>
>> The relevant thread's stack is here (see ti in the trace):
>>
>> ffff8801013d4000
>>
>> It could be interesting to see what's there.
>>
>> I don't suppose you want to try to walk the paging structures to see
>> if ffff88023bc80000 (i.e. gsbase) and, more specifically,
>> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are
>> present? You'd only have to walk one level -- presumably, if the PGD
>> entry is there, the rest of the entries are okay, too.
>
> That's all greek to me :-)
>
> I see that there is something at ffff88023bc80000:
>
> crash> x /64xg 0xffff88023bc80000
> 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9
> 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80050: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80060: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80070: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80080: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80090: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800a0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800b0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800c0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800d0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800e0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc800f0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80100: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80110: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80120: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80130: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80140: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80150: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80160: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80170: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80180: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc80190: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801a0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801b0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801c0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801d0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801e0: 0x0000000000000000 0x0000000000000000
> 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000
>
> old_rsp and kernel_stack seem bogus:
> crash> print old_rsp
> Cannot access memory at address 0xa200
> gdb: gdb request failed: print old_rsp
> crash> print kernel_stack
> Cannot access memory at address 0xaa48
> gdb: gdb request failed: print kernel_stack
>
> kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is:

Yup. old_rsp and kernel_stack are offsets relative to gsbase.

>
> crash> x /64xg 0xffff88023bc8aa00
> 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000

[...]

I don't know enough about crashkernel to know whether the fact that
this worked means anything.

Can you dump the page of physical memory at 0x4779a067? That's the PGD.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/