Re: [PATCH 05/11] mm: Introduce arch_pgd_init_late()
From: Andy Lutomirski
Date: Tue Sep 22 2015 - 14:37:50 EST
On Tue, Sep 22, 2015 at 11:26 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Sep 22, 2015 at 11:00 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>
>> I really really hate the vmalloc fault thing. It seems to work,
>> rather to my surprise. It doesn't *deserve* to work, because of
>> things like the percpu TSS accesses in the entry code that happen
>> without a valid stack.
>
> The thing is, I think you're misguided in your hatred.
>
> The reason I say that is because I think we should just embrace the
> fact that faults can and do happen in the kernel in very inconvenient
> places, and not just in code we "control".
>
> Even if you get rid of the vmalloc fault, you'll still have debug
> faults, and you'll still have NMI's and horrible crazy machine check
> faults.
>
> I actually think teh vmalloc fault is a good way to just let people
> know "pretty much anything can trap, deal with it".
>
> And I think trying to eliminate them is the wrong thing, because it
> forces us to be so damn synchronized. This whole patch-series is a
> prime example of why that is a bad bad things. We want to have _less_
> synchronization.
Sure, pretty much anything can trap, but we need to do *something* to
deal with it.
Debug faults can't happen with bad stacks any more (now that we honor
the kprobe blacklist), which means that debug faults could, in theory,
move off the IST stack. The SYSENTER + debug mess doesn't have any
stack problem.
NMIs and MCEs are special, and we deal with that using IST and all
kinds of mess.
I don't think that anyone really wants to move #PF to IST, which means
that we simply cannot handle vmalloc faults that happen when switching
stacks after SYSCALL, no matter what fanciness we shove into the
page_fault asm. If we move #PF to IST, then we have to worry about
page_fault -> nmi -> page_fault, which would be a clusterf*ck.
AMD gave us a pile of misguided architectural turds, and we have to
deal with it. My preference is to simplify dealing with it by getting
rid of vmalloc faults so that we can at least reliably touch percpu
memory without faulting.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/