Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix

From: Peter Zijlstra
Date: Thu Nov 29 2018 - 04:46:50 EST


On Fri, Oct 19, 2018 at 07:29:45AM -0700, Andy Lutomirski wrote:
> > On Oct 19, 2018, at 1:33 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> >> On Fri, Oct 19, 2018 at 01:08:23AM +0000, Nadav Amit wrote:
> >> Consider for example do_int3(), and see my inlined comments:
> >>
> >> dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> >> {
> >> ...
> >> ist_enter(regs); // => preempt_disable()
> >> cond_local_irq_enable(regs); // => assume it enables IRQs
> >>
> >> ...
> >> // resched irq can be delivered here. It will not caused rescheduling
> >> // since preemption is disabled
> >>
> >> cond_local_irq_disable(regs); // => assume it disables IRQs
> >> ist_exit(regs); // => preempt_enable_no_resched()
> >> }
> >>
> >> At this point resched will not happen for unbounded length of time (unless
> >> there is another point when exiting the trap handler that checks if
> >> preemption should take place).
> >>
> >> Another example is __BPF_PROG_RUN_ARRAY(), which also uses
> >> preempt_enable_no_resched().
> >>
> >> Am I missing something?
> >
> > Would not the interrupt return then check for TIF_NEED_RESCHED and call
> > schedule() ?
>
> The paranoid exit path doesnât check TIF_NEED_RESCHED because itâs
> fundamentally atomic â itâs running on a percpu stack and it canât
> schedule. In theory we could do some evil stack switching, but we
> donât.
>
> How does NMI handle this? If an NMI that hit interruptible kernel
> code overflows a perf counter, how does the wake up work?

NMIs should never set NEED_RESCHED. What the perf does it self-IPI
(irq_work) and do the wakeup from there.