Re: [PATCH RFC v2 00/21] kasan: hardware tag-based mode for production use on arm64
From: Dmitry Vyukov
Date: Thu Oct 22 2020 - 11:16:10 EST
On Thu, Oct 22, 2020 at 3:19 PM Andrey Konovalov <andreyknvl@xxxxxxxxxx> wrote:
>
> This patchset is not complete (hence sending as RFC), but I would like to
> start the discussion now and hear people's opinions regarding the
> questions mentioned below.
>
> === Overview
>
> This patchset adopts the existing hardware tag-based KASAN mode [1] for
> use in production as a memory corruption mitigation. Hardware tag-based
> KASAN relies on arm64 Memory Tagging Extension (MTE) [2] to perform memory
> and pointer tagging. Please see [3] and [4] for detailed analysis of how
> MTE helps to fight memory safety problems.
>
> The current plan is reuse CONFIG_KASAN_HW_TAGS for production, but add a
> boot time switch, that allows to choose between a debugging mode, that
> includes all KASAN features as they are, and a production mode, that only
> includes the essentials like tag checking.
>
> It is essential that switching between these modes doesn't require
> rebuilding the kernel with different configs, as this is required by the
> Android GKI initiative [5].
>
> The patch titled "kasan: add and integrate kasan boot parameters" of this
> series adds a few new boot parameters:
>
> kasan.mode allows choosing one of main three modes:
>
> - kasan.mode=off - no checks at all
> - kasan.mode=prod - only essential production features
> - kasan.mode=full - all features
>
> Those mode configs provide default values for three more internal configs
> listed below. However it's also possible to override the default values
> by providing:
>
> - kasan.stack=off/on - enable stacks collection
> (default: on for mode=full, otherwise off)
> - kasan.trap=async/sync - use async or sync MTE mode
> (default: sync for mode=full, otherwise async)
> - kasan.fault=report/panic - only report MTE fault or also panic
> (default: report)
>
> === Benchmarks
>
> For now I've only performed a few simple benchmarks such as measuring
> kernel boot time and slab memory usage after boot. The benchmarks were
> performed in QEMU and the results below exclude the slowdown caused by
> QEMU memory tagging emulation (as it's different from the slowdown that
> will be introduced by hardware and therefore irrelevant).
>
> KASAN_HW_TAGS=y + kasan.mode=off introduces no performance or memory
> impact compared to KASAN_HW_TAGS=n.
>
> kasan.mode=prod (without executing the tagging instructions) introduces
> 7% of both performace and memory impact compared to kasan.mode=off.
> Note, that 4% of performance and all 7% of memory impact are caused by the
> fact that enabling KASAN essentially results in CONFIG_SLAB_MERGE_DEFAULT
> being disabled.
>
> Recommended Android config has CONFIG_SLAB_MERGE_DEFAULT disabled (I assume
> for security reasons), but Pixel 4 has it enabled. It's arguable, whether
> "disabling" CONFIG_SLAB_MERGE_DEFAULT introduces any security benefit on
> top of MTE. Without MTE it makes exploiting some heap corruption harder.
> With MTE it will only make it harder provided that the attacker is able to
> predict allocation tags.
>
> kasan.mode=full has 40% performance and 30% memory impact over
> kasan.mode=prod. Both come from alloc/free stack collection.
>
> === Questions
>
> Any concerns about the boot parameters?
For boot parameters I think we are now "safe" in the sense that we
provide maximum possible flexibility and can defer any actual
decisions.
> Should we try to deal with CONFIG_SLAB_MERGE_DEFAULT-like behavor mentioned
> above?
How hard it is to allow KASAN with CONFIG_SLAB_MERGE_DEFAULT? Are
there any principal conflicts?
The numbers you provided look quite substantial (on a par of what MTE
itself may introduce). So I would assume if a vendor does not have
CONFIG_SLAB_MERGE_DEFAULT disabled, it may not want to disable it
because of MTE (effectively doubles overhead).