Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Denys Vlasenko
Date: Wed Mar 18 2015 - 17:06:24 EST


On 03/18/2015 09:49 PM, Andy Lutomirski wrote:
> On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>> On 03/18/2015 08:26 PM, Andy Lutomirski wrote:
>>> Hi Linus-
>>>
>>> You seem to enjoy debugging these things. Want to give this a shot?
>>> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
>>> right after swapgs in syscall entry.
>>
>> The code is:
>>
>> ENTRY(system_call)
>> SWAPGS_UNSAFE_STACK
>> GLOBAL(system_call_after_swapgs)
>> movq %rsp,PER_CPU_VAR(rsp_scratch)
>> movq PER_CPU_VAR(kernel_stack),%rsp
>>
>> If PER_CPU_VAR(var) memory access can page fault
>> (I was thinking this is ensured to never fault),
>> then on these two instructions such page fault
>> will be fatal: we will still have userspace %rsp.
>>
>> I thought we can only get a NMI or debug interrupt here,
>> and they are both set up to use IST stacks
>> to prevent this scenario (among other reasons).
>
> I don't think that #DB is possible -- we should never have a
> watchpoint on percpu memory like that (unless we're using kgdb, in
> which case I think that kgdb should be fixed).

And #DB shouldn't cause a problem even if it happens (it's on
an IST stack).

I was thinking about it more and the thing is, CPU did manage
to enter page fault handler.

It means that it managed to store iret frame.

This means that stores to (%rsp) worked, whatever %rsp is
(even if it points to user's page).

The double fault happened only when CALL insn inside the handler
attempted to push yet another word. _This_ is what did not work.

Why?

I almost ready to declare that it's SMAP triggering:
that attempts to access (write to) userspace were caught.
However, disassembly shows

crash> disassemble page_fault
Dump of assembler code for function page_fault:
0xffffffff816834a0 <+0>: data32 xchg %ax,%ax
0xffffffff816834a3 <+3>: data32 xchg %ax,%ax
0xffffffff816834a6 <+6>: data32 xchg %ax,%ax
0xffffffff816834a9 <+9>: sub $0x78,%rsp
0xffffffff816834ad <+13>: callq 0xffffffff81683620 <error_entry>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^KABOOM HERE^^^^^^^^^^^^^^^^^^^^^^^
0xffffffff816834b2 <+18>: mov %rsp,%rdi
0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi
0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp)
0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 <do_page_fault>
0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 <error_exit>
End of assembler dump.

Those NOPs at the beginning are ASM_CLAC and PARAVIRT_ADJUST_EXCEPTION_FRAME
from this source:


.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
ENTRY(\sym)
/* Sanity check */
.if \shift_ist != -1 && \paranoid == 0
.error "using shift_ist requires paranoid=1"
.endif

.if \has_error_code
XCPT_FRAME
.else
INTR_FRAME
.endif

ASM_CLAC
PARAVIRT_ADJUST_EXCEPTION_FRAME

subq $ORIG_RAX-R15, %rsp
call error_entry
...

If ASM_CLAC is replaced by NOPs, this CPU must be not SMAP capable.
If so, then another store to (%rsp) should have worked too...


Stefan, Takashi - are you seeing this on SMAP-capable CPUs?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/