Re: [PATCH RFC v2 00/21] kasan: hardware tag-based mode for production use on arm64

From: Andrey Konovalov
Date: Thu Oct 22 2020 - 13:00:57 EST


On Thu, Oct 22, 2020 at 5:16 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>
> On Thu, Oct 22, 2020 at 3:19 PM Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
> >
> > This patchset is not complete (hence sending as RFC), but I would like to
> > start the discussion now and hear people's opinions regarding the
> > questions mentioned below.
> >
> > === Overview
> >
> > This patchset adopts the existing hardware tag-based KASAN mode [1] for
> > use in production as a memory corruption mitigation. Hardware tag-based
> > KASAN relies on arm64 Memory Tagging Extension (MTE) [2] to perform memory
> > and pointer tagging. Please see [3] and [4] for detailed analysis of how
> > MTE helps to fight memory safety problems.
> >
> > The current plan is reuse CONFIG_KASAN_HW_TAGS for production, but add a
> > boot time switch, that allows to choose between a debugging mode, that
> > includes all KASAN features as they are, and a production mode, that only
> > includes the essentials like tag checking.
> >
> > It is essential that switching between these modes doesn't require
> > rebuilding the kernel with different configs, as this is required by the
> > Android GKI initiative [5].
> >
> > The patch titled "kasan: add and integrate kasan boot parameters" of this
> > series adds a few new boot parameters:
> >
> > kasan.mode allows choosing one of main three modes:
> >
> > - kasan.mode=off - no checks at all
> > - kasan.mode=prod - only essential production features
> > - kasan.mode=full - all features
> >
> > Those mode configs provide default values for three more internal configs
> > listed below. However it's also possible to override the default values
> > by providing:
> >
> > - kasan.stack=off/on - enable stacks collection
> > (default: on for mode=full, otherwise off)
> > - kasan.trap=async/sync - use async or sync MTE mode
> > (default: sync for mode=full, otherwise async)
> > - kasan.fault=report/panic - only report MTE fault or also panic
> > (default: report)
> >
> > === Benchmarks
> >
> > For now I've only performed a few simple benchmarks such as measuring
> > kernel boot time and slab memory usage after boot. The benchmarks were
> > performed in QEMU and the results below exclude the slowdown caused by
> > QEMU memory tagging emulation (as it's different from the slowdown that
> > will be introduced by hardware and therefore irrelevant).
> >
> > KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory
> > impact compared to KASAN_HW_TAGS=n.
> >
> > kasan.mode=prod (without executing the tagging instructions) introduces
> > 7% of both performace and memory impact compared to kasan.mode=off.
> > Note, that 4% of performance and all 7% of memory impact are caused by the
> > fact that enabling KASAN essentially results in CONFIG_SLAB_MERGE_DEFAULT
> > being disabled.
> >
> > Recommended Android config has CONFIG_SLAB_MERGE_DEFAULT disabled (I assume
> > for security reasons), but Pixel 4 has it enabled. It's arguable, whether
> > "disabling" CONFIG_SLAB_MERGE_DEFAULT introduces any security benefit on
> > top of MTE. Without MTE it makes exploiting some heap corruption harder.
> > With MTE it will only make it harder provided that the attacker is able to
> > predict allocation tags.
> >
> > kasan.mode=full has 40% performance and 30% memory impact over
> > kasan.mode=prod. Both come from alloc/free stack collection.

FTR, this only accounts for slab memory overhead that comes from
redzones that store stack ids. There's also page_alloc overhead from
the stacks themselves, which I didn't measure yet.

> >
> > === Questions
> >
> > Any concerns about the boot parameters?
>
> For boot parameters I think we are now "safe" in the sense that we
> provide maximum possible flexibility and can defer any actual
> decisions.

Perfect!

I realized that I actually forgot to think about the default values
when no boot params are specified, I'll fix this in the next version.

> > Should we try to deal with CONFIG_SLAB_MERGE_DEFAULT-like behavor mentioned
> > above?
>
> How hard it is to allow KASAN with CONFIG_SLAB_MERGE_DEFAULT? Are
> there any principal conflicts?

I'll explore this.

> The numbers you provided look quite substantial (on a par of what MTE
> itself may introduce). So I would assume if a vendor does not have
> CONFIG_SLAB_MERGE_DEFAULT disabled, it may not want to disable it
> because of MTE (effectively doubles overhead).

Sounds reasonable.

Thanks!