Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts
From: Andy Lutomirski
Date: Tue Aug 18 2015 - 18:35:54 EST
On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> This fixes a couple minor holes if we took an IRQ very early in syscall
>> processing:
>>
>> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
>> was fine), but we could warn if all the debugging options were
>> set.
>
> So this is fixing issues after your changes that call user_exit() from
> IRQs, right?
Yes. Here's an example splat, courtesy of Sasha:
https://gist.github.com/sashalevin/a006a44989312f6835e7
>
> But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> That's where the real issue is.
In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
when entering the kernel for a non-NMI reason. That means that we can
avoid all of the (expensive!) checks for what context we're in. It
also means that (other than IRQs, which need further cleanup), we only
switch once per user/kernel switch.
The cost for doing should be essentially zero, modulo artifacts from
poor inlining. IMO the code is much more straightforward than it used
to be, and it has the potential to be quite fast. For one thing, we
never invoke context tracking with IRQs on, and Rik had some profiles
suggesting that a bunch of the overhead involved dealing with repeated
irq flag manipulation.
One way or another, IRQs need to switch from RCU-not-watching to
RCU-watching, and I don't see what's wrong with user_exit for this
purpose. Of course, if user_exit is slow, we should fix that.
Also, this isn't really related to IRQs calling user_exit. It's that
IRQs can recurse into other entries (#GP in Sasha's case) which also
validate the context.
None of the speedups that will be enabled are written yet, but I
strongly suspect they will be soon :)
In my book, the fact that we now have context tracking assertions all
over the place is a good thing. It means we're much less likely to
break it.
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/