Re: [PATCH 1/8] kasan,x86: Frob kasan_report() in an exception

From: Peter Zijlstra
Date: Thu Feb 28 2019 - 12:46:26 EST


On Thu, Feb 28, 2019 at 05:03:09PM +0100, Dmitry Vyukov wrote:

> I am missing some knowledge about SMAP to answer this.
> In short, these tools insert lots of callbacks into runtime for memory
> accesses, function entry/exit, atomicops and some other. These
> callbacks can do things of different complexity.
> Humm... perhaps we could just disable SMAP for KMSAN/KTSAN. It's
> possible, right? If we have it enabled with KASAN, that should be
> enough.

SMAP detects access to _PAGE_USER pages; that is, such access is only
allowed when EFLAGS.AC=1, otherwise they'll fault.

I again don't know enough about KASAN to say if it does that; but I
suspect it only tracks kernel memory state.

> Also, what's the actual problem with KASAN+SMAP? Is it warnings from
> static analysis tool? Or there are also some runtime effects? What
> effects?

Both; so because of the above semantics, things like copy_to_user() will
have to do STAC (set EFLAGS.AC=1), then do the actual copies to the user
addresses, and then CLAC (clear the AC flag again).

The desire is to have AC=1 sections as small as possible, such that as
much code as possible is ran with AC=0 and will trap on unintended
accesses.

Also; the scheduler doesn't (but I have a patch for that, but I'd prefer
to not have to use it) context switch EFLAGS. This means that if we land
in the scheduler while AC=1, the next task will resume with AC=1.

Consequently, if that task returns to userspace before it gets scheduled
again, we'll continue our previous task (that left with AC=1) with AC=0
and it'll then fault where no fault were expected.

Anyway; the objtool annotation basically tracks the EFLAGS.AC state
(through STAC/CLAC instructions -- no PUSHF/POPF) and disallows any
CALL/RET while AC=1.

This is where the __asan_{load,store}*() stuff went *splat*. GCC inserts
those calls in the middle of STAC/CLAC (AC=1) and we then have to mark
the functions as AC-safe. objtool validates those on the same rules, no
further CALLs that are not also safe.

Things like __fentry__ are inherently unsafe because they use
preempt_disable/preempt_enable, where the latter has a CALL
__preempt_schedule (and is thus very unsafe). Similarly with
kasan_report(), it does all sorts of things that are not safe to do.

> Is it possible to disable the SMAP runtime checks once we enter
> kasan_report() past report_enabled() check? We could restrict it to
> "just finish printing this bug report whatever it takes and then
> whatever" if it makes things simpler.
> It would be nice if we could restrict it to something like:
>
> @@ -291,6 +303,7 @@ void kasan_report(unsigned long addr, size_t size,
> if (likely(!report_enabled()))
> return;
> + disable_smap();
>
> And then enforce panic at the end of report if smap is enabled.

That would be a CLAC, and the current rules disallow CLAC for AC-safe
functions.

Furthermore, kasan_report() isn't fatal, right? So it would have to
restore the state on exit. That makes the validation state much more
complicated.

Let me try and frob some of the report_enabled() stuff before the #UD.