Re: [PATCH RFC 0/8] kasan: hardware tag-based mode for production use on arm64

From: Andrey Konovalov
Date: Tue Oct 20 2020 - 08:13:18 EST


On Tue, Oct 20, 2020 at 7:34 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Tue, Oct 20, 2020 at 12:51 AM Kostya Serebryany <kcc@xxxxxxxxxx> wrote:
> >
> > Hi,
> > I would like to hear opinions from others in CC on these choices:
> > * Production use of In-kernel MTE should be based on stripped-down
> > KASAN, or implemented independently?
>
> Andrey, what are the fundamental consequences of basing MTE on KASAN?
> I would assume that there are none as we can change KASAN code and
> special case some code paths as necessary.

The main consequence is psychological and manifests in inheriting the name :)

But generally you're right. As we can change KASAN code, we can do
whatever we want, like adding fast paths for MTE, etc. If we Ctrl+C
Ctrl+V KASAN common code, we could potentially do some micro
optimizations (like avoiding a couple of checks), but I doubt that
will make any difference.

> > * Should we aim at a single boot-time flag (with several values) or
> > for several independent flags (OFF/SYNC/ASYNC, Stack traces on/off)
>
> We won't be able to answer this question for several years until we
> have actual hardware/users...
> It's definitely safer to aim at multiple options. I would reuse the fs
> opt parsing code as we seem to have lots of potential things to
> configure so that we can do:
> kasan_options=quarantine=off,fault=panic,trap=async
>
> I am also always confused by the term "debug" when configuring the
> kernel. In some cases it's for debugging of the subsystem (for
> developers of KASAN), in some cases it adds additional checks to catch
> misuses of the subsystem. in some - it just adds more debugging output
> on console. And in this case it's actually neither of these. But I am
> not sure what's a better name ("full"?). Even if we split options into
> multiple, we still can have some kind of presents that just flip all
> other options into reasonable values.

OK, let me try to incorporate the feedback I've heard so far into the
next version.

>
> > Andrey, please give us some idea of the CPU and RAM overheads other
> > than those coming from MTE
> > * stack trace collection and storage
> > * adding redzones to every allocation - not strictly needed for MTE,
> > but convenient to store the stack trace IDs.
> >
> > Andrey: with production MTE we should not be using quarantine, which
> > means storing the stack trace IDs
> > in the deallocated memory doesn't provide good report quality.
> > We may need to consider another approach, e.g. the one used in HWASAN
> > (separate ring buffer, per thread or per core)

My current priority is cleaning up the mode where stack traces are
disabled and estimating the slowdown from KASAN callbacks. Once done
with that, I'll switch to these ones.