Re: [PATCH] sched/x86: Save [ER]FLAGS on context switch

From: H. Peter Anvin
Date: Mon Feb 18 2019 - 17:31:06 EST


On 2/16/19 2:30 AM, Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 08:06:56PM -0800, hpa@xxxxxxxxx wrote:
>> This implies we invoke schedule -- a restricted operation (consider
>> may_sleep) during execution of STAC-enabled code, but *not* as an
>> exception or interrupt, since those preserve the flags.
>
> Meet preempt_enable().

I believe this falls under "doctor, it hurts when I do that." And it hurts for
very good reasons. See below.

>> I have serious concerns about this. This is more or less saying that
>> we have left an unlimited gap and have had AC escape.
>
> Yes; by allowing normal C in between STAC and CLAC this is so.
>
>> Does *anyone* see a need to allow this? I got a question at LPC from
>> someone about this, and what they were trying to do once all the
>> layers had been unwound was so far down the wrong track for a root
>> problem that actually has a very simple solution.
>
> Have you read the rest of the thread?
>
> All it takes for this to explode is a call to a C function that has
> tracing on it in between the user_access_begin() and user_access_end()
> things. That is a _very_ easy thing to do.
>
> Heck, GCC is allowed to generate that broken code compiling
> lib/strnlen_user.c; it doesn't _have_ to inline do_strnlen_user (esp.
> with CONFIG_OPTIMIZE_INLINING), and making that a function call would
> get us fentry hooks and ftrace and *BOOM*.
>
> (Now, of course, its a static function with a single caller, and GCC
> isn't _THAT_ stupid, but it could, if it wanted to)
>
> Since the bar is _that_ low for mistakes, I figure we'd better fix it.
>

The question is what "fix it" means. I'm really concerned about AC escapes,
and everyone else should be, too.

For an issue specific to tracing, it would be more appropriate to do the
appropriate things in the tracing entry/exit than in schedule. Otherwise, we
don't want to silently paper over mistakes which could mean that we run a
large portion of the kernel without protection we thought we had.

In that sense, calling schedule() with AC set is in fact a great place to have
a WARN() or even BUG(), because such an event really could imply that the
kernel has been security compromised.

Does that make more sense?

-hpa