Re: [PATCH 1/8] kasan,x86: Frob kasan_report() in an exception

From: Dmitry Vyukov
Date: Thu Feb 28 2019 - 13:18:30 EST


On Thu, Feb 28, 2019 at 6:46 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Feb 28, 2019 at 05:03:09PM +0100, Dmitry Vyukov wrote:
>
> > I am missing some knowledge about SMAP to answer this.
> > In short, these tools insert lots of callbacks into runtime for memory
> > accesses, function entry/exit, atomicops and some other. These
> > callbacks can do things of different complexity.
> > Humm... perhaps we could just disable SMAP for KMSAN/KTSAN. It's
> > possible, right? If we have it enabled with KASAN, that should be
> > enough.
>
> SMAP detects access to _PAGE_USER pages; that is, such access is only
> allowed when EFLAGS.AC=1, otherwise they'll fault.
>
> I again don't know enough about KASAN to say if it does that; but I
> suspect it only tracks kernel memory state.
>
> > Also, what's the actual problem with KASAN+SMAP? Is it warnings from
> > static analysis tool? Or there are also some runtime effects? What
> > effects?
>
> Both; so because of the above semantics, things like copy_to_user() will
> have to do STAC (set EFLAGS.AC=1), then do the actual copies to the user
> addresses, and then CLAC (clear the AC flag again).
>
> The desire is to have AC=1 sections as small as possible, such that as
> much code as possible is ran with AC=0 and will trap on unintended
> accesses.
>
> Also; the scheduler doesn't (but I have a patch for that, but I'd prefer
> to not have to use it) context switch EFLAGS. This means that if we land
> in the scheduler while AC=1, the next task will resume with AC=1.
>
> Consequently, if that task returns to userspace before it gets scheduled
> again, we'll continue our previous task (that left with AC=1) with AC=0
> and it'll then fault where no fault were expected.
>
> Anyway; the objtool annotation basically tracks the EFLAGS.AC state
> (through STAC/CLAC instructions -- no PUSHF/POPF) and disallows any
> CALL/RET while AC=1.
>
> This is where the __asan_{load,store}*() stuff went *splat*. GCC inserts
> those calls in the middle of STAC/CLAC (AC=1) and we then have to mark
> the functions as AC-safe. objtool validates those on the same rules, no
> further CALLs that are not also safe.
>
> Things like __fentry__ are inherently unsafe because they use
> preempt_disable/preempt_enable, where the latter has a CALL
> __preempt_schedule (and is thus very unsafe). Similarly with
> kasan_report(), it does all sorts of things that are not safe to do.
>
> > Is it possible to disable the SMAP runtime checks once we enter
> > kasan_report() past report_enabled() check? We could restrict it to
> > "just finish printing this bug report whatever it takes and then
> > whatever" if it makes things simpler.
> > It would be nice if we could restrict it to something like:
> >
> > @@ -291,6 +303,7 @@ void kasan_report(unsigned long addr, size_t size,
> > if (likely(!report_enabled()))
> > return;
> > + disable_smap();
> >
> > And then enforce panic at the end of report if smap is enabled.
>
> That would be a CLAC, and the current rules disallow CLAC for AC-safe
> functions.
>
> Furthermore, kasan_report() isn't fatal, right? So it would have to
> restore the state on exit. That makes the validation state much more
> complicated.
>
> Let me try and frob some of the report_enabled() stuff before the #UD.

What if do CLAC in the beginning of kasan_report, STAC at the end if
AC was set on entry. And then special-case kasan_report in the static
tool? This looks like a reasonable compromise to me taking into
account that KASAN is not enabled in production builds, so special
casing it in the static tool won't lead to missed bugs in production
code. That's effectively what _BUG_FLAGS will do, right? But just
without involving asm.
Or, perhaps, wrap it into something like:
void smap_call(void (*fn)(void* ctx), void *ctx);
that will do all of this. And then the tool only needs to know about
the smap_call?