Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds
From: Vinicius Costa Gomes
Date: Fri Dec 15 2023 - 15:00:49 EST
Hi Amir,
Amir Goldstein <amir73il@xxxxxxxxx> writes:
> +fsdevel because this may be relevant to any subsystem that
> keeps a long live cred copy (e.g. nfsd, ksmbd, cachefiles).
>
> +linus who wrote
> d7852fbd0f04 access: avoid the RCU grace period for the temporary
> subjective credentials
>
>
> On Fri, Dec 15, 2023 at 12:02 AM Vinicius Costa Gomes
> <vinicius.gomes@xxxxxxxxx> wrote:
>>
>> Permission checks in overlayfs also check against the credentials
>> associated with the superblock, which are assigned at mount() time, so
>> pretty long lived. So, we can omit the reference counting for this
>> case.
>
> You forgot to mention WHY you are proposing this and to link to the
> original report with the first optimization attempt:
>
> https://lore.kernel.org/linux-unionfs/20231018074553.41333-1-hu1.chen@xxxxxxxxx/
>
I thought that the "in-reply-to" would do that, I should have been more
explicit on the context. Sorry.
>>
>> This (very early) proof of concept does two things:
>>
>> Add a flag "immutable" (TODO: find a better name) to struct cred to
>> indicate that it is long lived, and that refcount can be omitted.
>>
>
> This reminds me of the many discussions about Rust abstractions
> that are going on right now.
> I think an abstraction like this one is called a "borrowed reference".
>
Yeah, very similar to a borrow in rust.
>> Add "guard" helpers, so we can use automatic cleanup to be sure
>> override/restore are always paired. (I didn't like that I have
>> 'ovl_cred' to be a container for the credentials, but couldn't think
>> of other solutions)
>>
>
> I like the guard but see comments below...
>
>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx>
>> ---
>> Hi Amir,
>>
>> Just to know if I am more or less on right track.
>>
>> This is a different attempt, instead of the local copy idea, I am
>> using the fact that the credentials associated with the mount() will
>> be alive for a long time. I think the result is almost the same. But I
>> could be missing something.
>>
>> TODO:
>> - Add asserts.
>> - Replace ovl_override_creds()/revert_Creds() by
>> ovl_creator_cred()/guard() everywhere.
>> - Probably more.
>>
>>
>> fs/overlayfs/inode.c | 7 ++++---
>> fs/overlayfs/overlayfs.h | 18 ++++++++++++++++++
>> fs/overlayfs/params.c | 4 +++-
>> fs/overlayfs/super.c | 10 +++++++---
>> fs/overlayfs/util.c | 10 ++++++++++
>> include/linux/cred.h | 12 ++++++++++--
>> 6 files changed, 52 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
>> index c63b31a460be..2c016a3bbe2d 100644
>> --- a/fs/overlayfs/inode.c
>> +++ b/fs/overlayfs/inode.c
>> @@ -290,9 +290,9 @@ int ovl_permission(struct mnt_idmap *idmap,
>> struct inode *inode, int mask)
>> {
>> struct inode *upperinode = ovl_inode_upper(inode);
>> + struct ovl_cred ovl_cred;
>> struct inode *realinode;
>> struct path realpath;
>> - const struct cred *old_cred;
>
> Nit: please don't reorder the variable definitions.
>
Sorry about that. "bad" habits from the networking side :-)
>> int err;
>>
>> /* Careful in RCU walk mode */
>> @@ -310,7 +310,9 @@ int ovl_permission(struct mnt_idmap *idmap,
>> if (err)
>> return err;
>>
>> - old_cred = ovl_override_creds(inode->i_sb);
>> + ovl_cred = ovl_creator_cred(inode->i_sb);
>> + guard(ovl_creds)(&ovl_cred);
>> +
>> if (!upperinode &&
>> !special_file(realinode->i_mode) && mask & MAY_WRITE) {
>> mask &= ~(MAY_WRITE | MAY_APPEND);
>> @@ -318,7 +320,6 @@ int ovl_permission(struct mnt_idmap *idmap,
>> mask |= MAY_READ;
>> }
>> err = inode_permission(mnt_idmap(realpath.mnt), realinode, mask);
>> - revert_creds(old_cred);
>>
>> return err;
>> }
>> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
>> index 05c3dd597fa8..22ea3066376e 100644
>> --- a/fs/overlayfs/overlayfs.h
>> +++ b/fs/overlayfs/overlayfs.h
>> @@ -416,6 +416,24 @@ static inline int ovl_do_getattr(const struct path *path, struct kstat *stat,
>> return vfs_getattr(path, stat, request_mask, flags);
>> }
>>
>> +struct ovl_cred {
>> + const struct cred *cred;
>> +};
>> +
>> +static inline struct ovl_cred ovl_creator_cred(struct super_block *sb)
>> +{
>> + struct ovl_fs *ofs = OVL_FS(sb);
>> +
>> + return (struct ovl_cred) { .cred = ofs->creator_cred };
>> +}
>> +
>> +void ovl_override_creds_new(struct ovl_cred *creator_cred);
>> +void ovl_revert_creds_new(struct ovl_cred *creator_cred);
>> +
>> +DEFINE_GUARD(ovl_creds, struct ovl_cred *,
>> + ovl_override_creds_new(_T),
>> + ovl_revert_creds_new(_T));
>> +
>
> This pattern is not unique to overlayfs.
> It is probably better to define a common container type struct override_cred
> in cred.h/cred.c that other code could also use.
>
Good idea.
>> /* util.c */
>> int ovl_get_write_access(struct dentry *dentry);
>> void ovl_put_write_access(struct dentry *dentry);
>> diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
>> index 3fe2dde1598f..008377b9241a 100644
>> --- a/fs/overlayfs/params.c
>> +++ b/fs/overlayfs/params.c
>> @@ -770,8 +770,10 @@ void ovl_free_fs(struct ovl_fs *ofs)
>> kfree(ofs->config.lowerdirs);
>> kfree(ofs->config.upperdir);
>> kfree(ofs->config.workdir);
>> - if (ofs->creator_cred)
>> + if (ofs->creator_cred) {
>> + cred_set_immutable(ofs->creator_cred, false);
>> put_cred(ofs->creator_cred);
>
> Not happy about this API.
>
> Two solutions I can think of:
> 1. (my preference) keep two copies of creator_cred, one refcounted copy
> and one non-refcounted that is used for override_creds()
> 2. put_cred_ref() which explicitly opts-in to dropping refcount on
> a borrowed reference, same as you do above but hidden behind
> a properly documented helper
>
Probably because I already have option (2) more or less understood, but
I think that having a single creator_cred marked as
"non-refcounted/long-lived" is simpler than having two copies, even the
the extra copy only exists for the duration of the override.
But it could be that I still can't imagine what you have in mind about
(1).
>> + }
>> kfree(ofs);
>> }
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index a0967bb25003..1ffb4f0f8186 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -1304,6 +1304,13 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
>> if (!cred)
>> goto out_err;
>>
>> + /* Never override disk quota limits or use reserved space */
>> + cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>> + /* The cred that is going to be associated with the super
>> + * block will not change.
>> + */
>> + cred_set_immutable(cred, true);
>> +
>
> Likewise, either:
> 1. Create a non-refcounted copy of creator_cred
Ah! I think now I see what you meant. Not sure I like, I think it's a
bit too error prone, it's hard to enforce that the copies would be kept
in sync in general. (even if in practice the only thing that would need
to be kept in sync is the destruction of both, at least for now).
> or
> 2. Use a documented helper prepare_creds_ref() to hide
> this implementation detail
This I like more, having the properties documented in the constructor.
And much better than my _set_immutable().
>
>> err = ovl_fs_params_verify(ctx, &ofs->config);
>> if (err)
>> goto out_err;
>> @@ -1438,9 +1445,6 @@ int ovl_fill_super(struct super_block *sb, struct fs_context *fc)
>> else if (!ofs->nofh)
>> sb->s_export_op = &ovl_export_fid_operations;
>>
>> - /* Never override disk quota limits or use reserved space */
>> - cap_lower(cred->cap_effective, CAP_SYS_RESOURCE);
>> -
>> sb->s_magic = OVERLAYFS_SUPER_MAGIC;
>> sb->s_xattr = ovl_xattr_handlers(ofs);
>> sb->s_fs_info = ofs;
>> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
>> index c3f020ca13a8..9ae9a35a6a7a 100644
>> --- a/fs/overlayfs/util.c
>> +++ b/fs/overlayfs/util.c
>> @@ -68,6 +68,16 @@ const struct cred *ovl_override_creds(struct super_block *sb)
>> return override_creds(ofs->creator_cred);
>> }
>>
>> +void ovl_override_creds_new(struct ovl_cred *creator_cred)
>> +{
>> + creator_cred->cred = override_creds(creator_cred->cred);
>> +}
>> +
>> +void ovl_revert_creds_new(struct ovl_cred *creator_cred)
>> +{
>> + revert_creds(creator_cred->cred);
>> +}
>
> Would look nicer in this generic form, no?
>
> void override_cred_save(struct override_cred *override)
> {
> override->cred = override_creds(override->cred);
> }
>
> void override_cred_restore(struct override_cred *old)
> {
> revert_creds(old->cred);
> }
>
Much nicer :-)
> Which reminds me that memalloc_*_{save,restore} are good
> candidates for defining a guard.
>
>> +
>> /*
>> * Check if underlying fs supports file handles and try to determine encoding
>> * type, in order to deduce maximum inode number used by fs.
>> diff --git a/include/linux/cred.h b/include/linux/cred.h
>> index af8d353a4b86..06eaedfe48ea 100644
>> --- a/include/linux/cred.h
>> +++ b/include/linux/cred.h
>> @@ -151,6 +151,7 @@ struct cred {
>> int non_rcu; /* Can we skip RCU deletion? */
>> struct rcu_head rcu; /* RCU deletion hook */
>> };
>> + bool immutable;
>> } __randomize_layout;
>>
>
> If we choose the design that the immutable/non-refcount property
> is a const property and we need to create a copy of struct cred
> whenever we want to use a non-refcounted copy, then we could
> store this in the union because RCU deletion is also not needed for
> non-refcounted copy:
>
> struct {
> int non_refcount:1; /* A borrowed reference? */
> int non_rcu:1; /* Can we skip RCU deletion? */
> };
> struct rcu_head rcu; /* RCU deletion hook */
> };
>
Ah! Now it makes sense. Thank you.
If you, or any others of course, don't have objections against option
(2), I think I am going to play with it a bit and see how it goes.
Cheers,
--
Vinicius