Re: KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas)

From: Thomas Gleixner
Date: Thu Nov 02 2017 - 12:03:45 EST


On Thu, 2 Nov 2017, Andy Lutomirski wrote:
> > On Nov 2, 2017, at 1:45 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > Simpler is not the question. I want to avoid mapping the whole IST stacks.
> >
>
> OK, let's see. We can have the IDT be different in the user tables and
> the kernel tables. The user IDT could have IST-less entry stubs that do
> their own CR3 switch and then bounce to the IST stack. I don't see why
> this wouldn't work aside from requiring a substantially larger entry
> stack, but I'm also not convinced it's worth the added complexity. The
> NMI code would certainly need some careful thought to convince ourselves
> that it would still be correct. #DF would be, um, interesting because of
> the silly ESPFIX64 thing.

> My inclination would be to deal with this later. For the first upstream
> version, we map the IST stacks. Later on, we have a separate user IDT
> that does whatever it needs to do.
>
> The argument to the contrary would be that Dave's CR3 code *and* my entry
> stack crap gets simpler if all the CR3 switches happen in special stubs.
>
> The argument against *that* is that this erase_kstack crap might also
> benefit from the magic stack switch. OTOH that's the *exit* stack, which
> is totally independent.

My initial thought was: Use always IST stub stacks for entry and exit.

So the entry/exit stubs deal with the CR3 stuff and also with the extra
magic for espfix and nested NMIs, etc. Once that is done, you just flip
over to the relevant kernel internal stack and switch back to the user
visible one on return. Haven't thought that through completely, but in my
naive view it made stuff simpler.

> FWIW, I want to get rid of the #DB and #BP stacks entirely, but that does
> not deserve to block this series, I think.

Agreed.

Thanks,

tglx