Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack
From: Peter Zijlstra
Date: Fri Jun 25 2021 - 07:00:40 EST
On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> >
> >
> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > > > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> > > >
> > > > Current kernel has no code to enforce data breakpoint not on the thread
> > > > stack. If there is any data breakpoint on the top area of the thread
> > > > stack, there might be problem.
> > >
> > > And because the kernel does not prevent data breakpoints on the thread
> > > stack we need to do more complicated things in the already horrible
> > > entry code instead of just doing the obvious and preventing data
> > > breakpoints on the thread stack?
> >
> > Preventing breakpoints on the thread stack is a bit messy: it’s
> > possible for a breakpoint to be set before the address in question is
> > allocated for the thread stack.
>
> How about we call into C from the entry stack and have the from-user
> stack swizzle there. The from-kernel entries land on the ISTs and those
> are already excluded.
>
> > None of this is NMI-specific. #DB itself has the same problem. We
> > could plausibly solve it differently by disarming breakpoints in the
> > entry asm before switching stacks. I’m not sure how much I like that
> > approach.
>
> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> we recurse, we'll get a from-kernel trap, which will land on the IST,
> whcih is excluded, and then we clear DR7 there.
>
> IST and entry stack are excluded, the only problem we have is thread
> stack, and that can be solved by calling into C from the entry stack.
>
> I should put teaching objtool about .data references from .noinstr.text
> and .entry.text higher on the todo list I suppose ...
Also, I think we can run the from-user exceptions on the entry stack,
without ever switching to the kernel stack, except for #PF, which is
magical and schedules.
Same for SYSCALL, leave switching to the thread stack until C, somewhere
late, right before we'd enable IRQs or something.