Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Andy Lutomirski
Date: Wed Mar 18 2015 - 18:25:24 EST


On Wed, Mar 18, 2015 at 3:18 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Mar 18, 2015 at 2:55 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> On Xen, it goes to xen_sysret64, which touches the same percpu
>> variables that we touch on entry. So I still like my percpu vmap
>> fault hypothesis, even though I don't understand what would trigger
>> it.
>
> I don't dislike the theory per se, but not only don't I see how it
> could happen on regular execution on a laptop, but I also don't see
> why this fault behavior would be new to 4.0.
>
> (And I do believe that we should make sure that CPU bringup ends up
> faulting in the percpu area, even if I don't really see why that would
> be the issue here)
>
> Afaik, the system call entry code hasn't changed at all.
>
> What *has* changed is the "paranoid" handling (double-fault has that
> magical "paranoid=2" thing, for example) and the return to user-space
> code.

Indeed. If this were #DB, #BP, or #MC, I'd believe that, but the page
fault code didn't change. And double-fault didn't materially change
-- the paranoid=2 thing means to opt *out* of the recent changes. So
I'm not convinced by that theory.

>
> Which is really why I don't believe in that syscall thing. Not because
> it isn't the obvious culprit, but simply because it hasn't *changed*.
>
> Or is there something subtle I've missed?

We did change one thing here: for the first time* it's possible to
exit using sysret when we didn't enter using syscall. But this really
shouldn't matter on native, since we don't touch any memory at all
between the stack switch and sysret.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/