Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

From: Peter Zijlstra
Date: Sat Jun 26 2021 - 04:29:14 EST


On Sat, Jun 26, 2021 at 09:03:23AM +0200, Thomas Gleixner wrote:
> On Fri, Jun 25 2021 at 13:00, Peter Zijlstra wrote:
> > On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> >> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> >> >
> >> >
> >> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> >> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> >> > > > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> >> > > >
> >> > > > Current kernel has no code to enforce data breakpoint not on the thread
> >> > > > stack. If there is any data breakpoint on the top area of the thread
> >> > > > stack, there might be problem.
> >> > >
> >> > > And because the kernel does not prevent data breakpoints on the thread
> >> > > stack we need to do more complicated things in the already horrible
> >> > > entry code instead of just doing the obvious and preventing data
> >> > > breakpoints on the thread stack?
> >> >
> >> > Preventing breakpoints on the thread stack is a bit messy: it’s
> >> > possible for a breakpoint to be set before the address in question is
> >> > allocated for the thread stack.
> >>
> >> How about we call into C from the entry stack and have the from-user
> >> stack swizzle there. The from-kernel entries land on the ISTs and those
> >> are already excluded.
> >>
> >> > None of this is NMI-specific. #DB itself has the same problem. We
> >> > could plausibly solve it differently by disarming breakpoints in the
> >> > entry asm before switching stacks. I’m not sure how much I like that
> >> > approach.
> >>
> >> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> >> we recurse, we'll get a from-kernel trap, which will land on the IST,
> >> whcih is excluded, and then we clear DR7 there.
> >>
> >> IST and entry stack are excluded, the only problem we have is thread
> >> stack, and that can be solved by calling into C from the entry stack.
> >>
> >> I should put teaching objtool about .data references from .noinstr.text
> >> and .entry.text higher on the todo list I suppose ...
> >
> > Also, I think we can run the from-user exceptions on the entry stack,
> > without ever switching to the kernel stack, except for #PF, which is
> > magical and schedules.
>
> No. Pretty much any exception coming from user space can schedule and
> even if it does not voluntary it can be preempted.

Won't most of them have IRQs disabled throughout? In any case, I think
we should only switch to the task stack right around the time we're
ready to enable IRQs just like for syscall/#PF, not earlier.