Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible

From: Alexander Potapenko
Date: Fri Mar 03 2023 - 10:31:08 EST


On Thu, Mar 2, 2023 at 9:11 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 02, 2023 at 11:54:03AM -0800, Kees Cook wrote:
> > On Thu, Mar 02, 2023 at 07:19:49PM +0000, Al Viro wrote:
> > > On Thu, Mar 02, 2023 at 11:10:03AM -0800, Linus Torvalds wrote:
> > > > On Thu, Mar 2, 2023 at 11:03 AM Linus Torvalds
> > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > It might be best if we actually exposed it as a SLAB_SKIP_ZERO thing,
> > > > > just to make it possible to say - exactly in situations like this -
> > > > > that this particular slab cache has no advantage from pre-zeroing.
> > > >
> > > > Actually, maybe it's just as well to keep it per-allocation, and just
> > > > special-case getname_flags() itself.
> > > >
> > > > We could replace the __getname() there with just a
> > > >
> > > > kmem_cache_alloc(names_cachep, GFP_KERNEL | __GFP_SKIP_ZERO);
> > > >
> > > > we're going to overwrite the beginning of the buffer with the path we
> > > > copy from user space, and then we'd have to make people comfortable
> > > > with the fact that even with zero initialization hardening on, the
> > > > space after the filename wouldn't be initialized...
> > >
> > > ACK; same in getname_kernel() and sys_getcwd(), at the very least.
> >
> > FWIW, much earlier analysis suggested opting out these kmem caches:
> >
> > buffer_head
> > names_cache
> > mm_struct
> > anon_vma
> > skbuff_head_cache
> > skbuff_fclone_cache
>
> I would probably add dentry_cache to it; the only subtle part is
> ->d_iname and I'm convinced that explicit "make sure there's a NUL
> at the very end" is enough.

FWIW, a couple of years ago I was looking into implementing the
following scheme for opt-out that I also discussed with Kees:

1. Add a ___GFP_SKIP_ZERO flag that is not intended to be used
explicitly (e.g. disallow passing it to kmalloc(), add a checkpatch.pl
warning). Explicitly passing an opt-out flag to allocation functions
was considered harmful previously:
https://lore.kernel.org/kernel-hardening/20190423083148.GF25106@xxxxxxxxxxxxxx/

2. Define new allocation API that will allow opt-out:

struct page *alloc_pages_uninit(gfp_t gfp, unsigned int order, const
char *key);
void *kmalloc_uninit(size_t size, gfp_t flags, const char *key);
void *kmem_cache_alloc_uninit(struct kmem_cache *, gfp_t flags,
const char *key);

, where @key is an arbitrary string that identifies a single
allocation or a group of allocations.

3. Provide a boot-time flag that holds a comma-separated list of
opt-out keys that actually take effect:

init_on_alloc.skip="xyz-camera-driver,some_big_buffer".

The rationale behind this two-step mechanism is that despite certain
allocations may be appealing opt-out targets for performance reasons,
some vendors may choose to be on the safe side and explicitly list the
allocations that should not be zeroed.

Several possible enhancements include:
1. A SLAB_NOINIT memcache flag that prohibits cache merging and
disables initialization. Because the number of caches is relatively
small, it might be fine to explicitly pass SLAB_NOINIT at cache
creation sites.
Again, if needed, we could only use this flag as a hint that needs to
be acknowledged by a boot-time option.
2. No opt-out for kmalloc() - instead advise users to promote the
anonymous allocations they want to opt-out to memory caches with
SLAB_NOINIT.
3. Provide an emergency brake that completely disables
___GFP_SKIP_ZERO if a major security breach is discovered.

Extending this idea of per-callsite opt-out we could generate unique
keys for every allocation in the kernel (or e.g. group them together
by the caller name) and decide at runtime if we want to opt them out.
But I am not sure anyone would want this level of control in their distro.