Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts
From: Frederic Weisbecker
Date: Wed Aug 19 2015 - 13:10:20 EST
On Tue, Aug 18, 2015 at 04:07:51PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> >> >> This fixes a couple minor holes if we took an IRQ very early in syscall
> >> >> processing:
> >> >>
> >> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
> >> >> was fine), but we could warn if all the debugging options were
> >> >> set.
> >> >
> >> > So this is fixing issues after your changes that call user_exit() from
> >> > IRQs, right?
> >>
> >> Yes. Here's an example splat, courtesy of Sasha:
> >>
> >> https://gist.github.com/sashalevin/a006a44989312f6835e7
> >>
> >> >
> >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> >> > That's where the real issue is.
> >>
> >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
> >> when entering the kernel for a non-NMI reason.
> >
> > Why? IRQs don't need that! We already have irq_enter()/irq_exit().
> >
>
> Those are certainly redundant.
So? What's the point in duplicating a hook in arch code that core code already
has?
> I want to have a real hook to call
> that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
> context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
> exist.
You're not answering _why_ you want that.
>
> > And we don't want to call rcu_user_*() pairs on IRQs, you're
> > introducing a serious performance regression here! And I'm talking about
> > the code that's currently in -tip.
>
> Is there an easy way to fix it? For example, could we figure out what
> makes it take so long and make it faster?
Sure, just remove your arch IRQ hook.
> If we need to, we could
> back out the IRQ bit and change the assertions for 4.3, but I'd rather
> keep the exact context tracking if at all possible.
I have no idea what you mean by exact context tracking here.
But If we ever want to call irq_enter() using arch hooks, and I have no idea why
we would ever want to do that since that involve complexifying the code
by $NR_ARCHS and moving C code to ASM, we need serious reasons! And that's
certainly not something we are going to plan now for the next week's merge window.
> >> That means that we can
> >> avoid all of the (expensive!) checks for what context we're in.
> >
> > If you're referring to context tracking, the context check is a per-cpu
> > read. Not something that's usually considered expensive.
>
> In -tip, there aren't even extra branches, except those imposed by the
> user_exit implementation.
No there is the "call enter_from_user_mode" in the IRQ fast path.
>
> >
> >> It also means that (other than IRQs, which need further cleanup), we only
> >> switch once per user/kernel switch.
> >
> > ???
>
> In 4.2 and before, we can switch multiple times on the way out of the
> kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it
> exactly once no matter what.
That's what we want for syscalls but not for IRQs.
>
> >
> >>
> >> The cost for doing should be essentially zero, modulo artifacts from
> >> poor inlining.
> >
> > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> > implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> > the existing one in irq_enter/exit() (those even imply a lock currently, which will
> > probably be turned to seqcount, but still, full memory barriers...).
> >
> > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> > concerns current tip:x86/asm).
>
> Why do we need these heavyweight barriers?
Actually it's not full barriers but atomic ones (smp_mb__after_atomic_stuff())
I suspect we can't do much better given RCU requirements.
Still we don't need to call it twice.
>
> If there's actually a measurable performance hit in IRQs in -tip, then
> can we come up with a better fix?
I'm sure it's very easily measurable.
> For example, we could change all
> the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
> context" and make the IRQ entry do a lighter weight context tracking
> operation.
I don't see what we need to check actually. Context tracking can be in any
state while in IRQ.
>
> But I think I'm still missing something fundamental about the
> performance: why is irq_enter() any faster than user_exit()?
It's stlightly faster at least because it takes care of nesting IRQs which
is likely with softirqs that get interrupted.
Now of course we wouldn't call user_exit() in this case, but the hook is there
in generic code, no need for anything from the arch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/