Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible
From: Eric Biggers
Date: Thu Mar 02 2023 - 14:48:17 EST
On Thu, Mar 02, 2023 at 11:38:50AM -0800, Kees Cook wrote:
> On Thu, Mar 02, 2023 at 11:10:03AM -0800, Linus Torvalds wrote:
> > On Thu, Mar 2, 2023 at 11:03 AM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > It might be best if we actually exposed it as a SLAB_SKIP_ZERO thing,
> > > just to make it possible to say - exactly in situations like this -
> > > that this particular slab cache has no advantage from pre-zeroing.
> >
> > Actually, maybe it's just as well to keep it per-allocation, and just
> > special-case getname_flags() itself.
> >
> > We could replace the __getname() there with just a
> >
> > kmem_cache_alloc(names_cachep, GFP_KERNEL | __GFP_SKIP_ZERO);
> >
> > we're going to overwrite the beginning of the buffer with the path we
> > copy from user space, and then we'd have to make people comfortable
> > with the fact that even with zero initialization hardening on, the
> > space after the filename wouldn't be initialized...
>
> Yeah, I'd love to have a way to safely opt-out of always-zero. The
> discussion[1] when we originally did this devolved into a guessing
> game on performance since no one could actually point to workloads
> that were affected by it, beyond skbuff[2]. So in the interest of not
> over-engineering a solution to an unknown problem, the plan was once
> someone found a problem, we could find a sensible solution at that
> time. And so here we are! :)
>
> I'd always wanted to avoid a "don't zero" flag and instead adjust APIs so
> the allocation could include a callback to do the memory content filling
> that would return a size-that-was-initialized result. That way we don't
> end up in the situations we've seen so many times with drivers, etc,
> where an uninit buffer is handed off and some path fails to actually
> fill it with anything. However, in practice, I think this kind of API
> change becomes really hard to do.
>
Having not been following init_on_alloc very closely myself, I'm a bit surprised
that an opt-out flag never made it into the final version.
Was names_cachep considered in those earlier discussions? I think that's a
pretty obvious use case for an opt-out. Every syscall that operates on a path
allocates a 4K buffer from names_cachep.
- Eric