Re: [PATCH] mm: make __GFP_SKIP_ZERO visible to skip zero operation

From: Alexander Potapenko
Date: Fri Sep 01 2023 - 08:56:04 EST


On Fri, Sep 1, 2023 at 12:29 PM Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> wrote:
>
> loop alex

(adding more people who took part in the previous discussions)

>
> On Thu, Aug 31, 2023 at 8:16 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > >
> > > There is no explicit gfp flags to let the allocation skip zero
> > > operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> > > __GFP_SKIP_ZERO be visible even if kasan is not configured.

Hi all,

This is a recurring question, as people keep encountering performance
problems on systems with init_on_alloc=1
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1862822 being
one of the examples).

Brad Spengler has also pointed out
(https://twitter.com/spendergrsec/status/1296461651659694082) that
there are cases where there is no security vs. performance tradeoff
(e.g. kmemdup() and kstrdup()).

An opt-out flag was included in the initial init_on_alloc series, but
back then Michal Hocko has noted that it might easily get out of
control: https://patchwork.kernel.org/project/linux-hardening/patch/20190418154208.131118-2-glider@xxxxxxxxxx/#22600229.

Now that init_on_alloc is actually being used by people which may have
different preferences wrt. security and performance (in the cases
where this tradeoff exists), we must be very careful with the opt-out
GFP flag. Not initializing a particular allocation site in the
upstream kernel will affect every downstream user, and some may
consider this a security regression.
Another problematic case is an OS vendor mandating init_on_alloc
everywhere, but a third party driver vendor doing kmalloc(...,
__GFP_SKIP_ZERO) for their allocations.

So I think a working opt-out scheme for the heap initialization should
be two-step:
1. The code owner may decide that a particular allocation site needs
an opt-out, and make the upstream code change;
2. The OS vendor has the ability to override that decision for the
kernel they ship without the need to patch the source.

Let me quoute the idea briefly outlined at
https://lore.kernel.org/lkml/CAG_fn=UQEuvJ9WXou_sW3moHcVQZJ9NvJ5McNcsYE8xw_WEYGw@xxxxxxxxxxxxxx/
(unfortunately the discussion got derailed a bit):

"""
1. Add a ___GFP_SKIP_ZERO flag that is not intended to be used
explicitly (e.g. disallow passing it to kmalloc(), add a checkpatch.pl
warning). Explicitly passing an opt-out flag to allocation functions
was considered harmful previously:
https://lore.kernel.org/kernel-hardening/20190423083148.GF25106@xxxxxxxxxxxxxx/

2. Define new allocation API that will allow opt-out:
struct page *alloc_pages_uninit(gfp_t gfp, unsigned int order, const
char *key);
void *kmalloc_uninit(size_t size, gfp_t flags, const char *key);
void *kmem_cache_alloc_uninit(struct kmem_cache *, gfp_t flags,
const char *key);
, where @key is an arbitrary string that identifies a single
allocation or a group of allocations.

3. Provide a boot-time flag that holds a comma-separated list of
opt-out keys that actually take effect:
init_on_alloc.skip="xyz-camera-driver,some_big_buffer".
"""

A draft implementation at
https://github.com/ramosian-glider/linux/commit/00791be14eb1113eae615c74b652f94b5cc3c336
(which probably does not apply anymore) may give some insight into how
this is supposed to work.
There's plenty of room for bikeshedding here (does the command-line
flag opt-in or opt-out? should we use function names instead of some
"keys"? can we allow overriding every allocation site without the need
for alloc_pages_uninit()?), but if the overall scheme is viable I can
probably proceed with an RFC.

> >
> > This bypasses a security feature so you're going to have to do a little
> > better than "I want it".
> Thanks for pointing this out. What I want to do is to give the user a
> way to exempt some types of pages from being zeroed, which could help
> on performance issues. Could we have the most safety concern admin
> use INIT_ON_FREE while the less concerned use INIT_ON_ALLOC &
> __GFP_SKIP_ZERO as a light version method?

init_on_free has a more significant performance impact, and might be
more problematic for production use (even more opt-outs would've been
needed).

As a side note, I don't think we should repurpose the same
__GFP_SKIP_ZERO flag used by KASAN, because the semantics of the flags
may not match 1:1.