Re: [PATCH v3 2/2] vfs: avoid duplicating creds in faccessat if possible

From: Al Viro
Date: Thu Mar 02 2023 - 13:41:54 EST


On Thu, Mar 02, 2023 at 06:18:32PM +0000, Al Viro wrote:
> On Thu, Mar 02, 2023 at 07:14:24PM +0100, Mateusz Guzik wrote:
> > On 3/2/23, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > On Thu, Mar 2, 2023 at 12:30 AM Christian Brauner <brauner@xxxxxxxxxx>
> > > wrote:
> > >>
> > >> Fwiw, as long as you, Al, and others are fine with it and I'm aware of
> > >> it I'm happy to pick up more stuff like this. I've done it before and
> > >> have worked in this area so I'm happy to help with some of the load.
> > >
> > > Yeah, that would work. We've actually had discussions of vfs
> > > maintenance in general.
> > >
> > > In this case it really wasn't an issue, with this being just two
> > > fairly straightforward patches for code that I was familiar with.
> > >
> >
> > Well on that note I intend to write a patch which would add a flag to
> > the dentry cache.
> >
> > There is this thing named CONFIG_INIT_ON_ALLOC_DEFAULT_ON which is
> > enabled at least on debian, ubuntu and arch. It results in mandatory
> > memset on the obj before it gets returned from kmalloc, for hardening
> > purposes.
> >
> > Now the problem is that dentry cache allocates bufs 4096 bytes in
> > size, so this is an equivalent of a clear_page call and it happens
> > *every time* there is a path lookup.
>
> Huh? Quite a few path lookups don't end up allocating any dentries;
> what exactly are you talking about?
>
> > Given how dentry cache is used, I'm confident there is 0 hardening
> > benefit for it.
> >
> > So the plan would be to add a flag on cache creation to exempt it from
> > the mandatory memset treatment and use it with dentry.
> >
> > I don't have numbers handy but as you can imagine it gave me a nice bump :)

Hmm... Let's see what __d_alloc() will explicitly store into:
[*] unsigned int d_flags;
[*] seqcount_spinlock_t d_seq;
[*] struct hlist_bl_node d_hash;
[*] struct dentry *d_parent;
[*] struct qstr d_name;
[*] struct inode *d_inode;
unsigned char d_iname[DNAME_INLINE_LEN];
[*] struct lockref d_lockref;
[*] const struct dentry_operations *d_op;
[*] struct super_block *d_sb;
unsigned long d_time;
[*] void *d_fsdata;
union {
[*] struct list_head d_lru;
wait_queue_head_t *d_wait;
};
[*] struct list_head d_child;
[*] struct list_head d_subdirs;
union {
struct hlist_node d_alias;
struct hlist_bl_node d_in_lookup_hash;
struct rcu_head d_rcu;
} d_u;

These are straightforward "overwrite no matter what". ->d_time is not
initialized at all - it's fs-private, and unused for the majority
of filesystems (kernfs, nfs and vboxsf are the only users).
->d_alias/->d_in_lookup_hash/->d_rcu are also uninitialized - those
are valid only on some stages of dentry lifecycle (which is why
they can share space) and initialization is responsibility of
the places that do the corresponding state transitions.

->d_iname is the only somewhat delicate one. I think we are OK as
it is (i.e. that having all of it zeroed out won't buy anything), but
that's not trivial. Note that the last byte does need to be
initialized, despite the
memcpy(dname, name->name, name->len);
dname[name->len] = 0;
following it.

I'm not saying that getting rid of pre-zeroing the entire underlying
page is the wrong thing to do; it's just that it needs some analysis in
commit message.