Re: [PATCH v2 1/2] kasan: allow sampling page_alloc allocations for HW_TAGS
From: Andrey Konovalov
Date: Wed Nov 30 2022 - 14:51:58 EST
.On Tue, Nov 29, 2022 at 12:30 PM Marco Elver <elver@xxxxxxxxxx> wrote:
>
> On Sat, 26 Nov 2022 at 20:12, <andrey.konovalov@xxxxxxxxx> wrote:
> >
> > From: Andrey Konovalov <andreyknvl@xxxxxxxxxx>
> >
> > Add a new boot parameter called kasan.page_alloc.sample, which makes
> > Hardware Tag-Based KASAN tag only every Nth page_alloc allocation for
> > allocations marked with __GFP_KASAN_SAMPLE.
>
> This is new - why was it decided that this is a better design?
Sampling all page_alloc allocations (with the suggested frequency of 1
out of 10) effectively means that KASAN/MTE is no longer mitigation
for page_alloc corruptions. The idea here was to only apply sampling
to selected allocations, so that all others are still checked
deterministically.
However, it's hard to say whether this is critical from the security
perspective. Most exploits today corrupt slab objects, not page_alloc.
> This means we have to go around introducing the GFP_KASAN_SAMPLE flag
> everywhere where we think it might cause a performance degradation.
>
> This depends on accurate benchmarks. Yet, not everyone's usecases will
> be the same. I fear we might end up with marking nearly all frequent
> and large page-alloc allocations with GFP_KASAN_SAMPLE.
>
> Is it somehow possible to make the sampling decision more automatic?
>
> E.g. kasan.page_alloc.sample_order -> only sample page-alloc
> allocations with order greater or equal to sample_order.
Hm, perhaps this could be a good middle ground between sampling all
allocations and sprinkling GFP_KASAN_SAMPLE.
Looking at the networking code, most multi-page data allocations are
done with the order of 3 (either via PAGE_ALLOC_COSTLY_ORDER or
SKB_FRAG_PAGE_ORDER). So this would be the required minimum value for
kasan.page_alloc.sample_order to alleviate the performance impact for
the networking workloads.
I measured the number of allocations for each order from 0 to 8 during
boot in my test build:
7299 867 318 206 86 8 7 5 2
So sampling with kasan.page_alloc.sample_order=3 would affect only ~7%
of page_alloc allocations that happen normally, which is not bad. (Of
course, if an attacker can control the size of the allocation, they
can increase the order to enable sampling.)
I'll do some more testing and either send a v3 with this approach or
get back to this discussion.
Thanks for the suggestion!