Re: [syzbot] [integrity] [overlayfs] general protection fault in d_path
From: Amir Goldstein
Date: Sun Oct 01 2023 - 10:52:08 EST
On Fri, Sep 29, 2023 at 3:39 PM Stefan Berger <stefanb@xxxxxxxxxxxxx> wrote:
>
>
> On 9/29/23 00:25, Amir Goldstein wrote:
> > On Fri, Sep 29, 2023 at 3:02 AM Stefan Berger <stefanb@xxxxxxxxxxxxx> wrote:
> >>
> >> On 9/21/23 07:48, Christian Brauner wrote:
> >>> Imho, this is all very wild but I'm not judging.
> >>>
> >>> Two solutions imho:
> >>> (1) teach stacking filesystems like overlayfs and ecryptfs to use
> >>> vfs_getattr_nosec() in their ->getattr() implementation when they
> >>> are themselves called via vfs_getattr_nosec(). This will fix this by
> >>> not triggering another LSM hook.
> >>
> >> You can avoid all this churn.
> >> Just use the existing query_flags arg.
> >> Nothing outside the AT_STATX_SYNC_TYPE query_flags is
> >> passed into filesystems from userspace.
> >>
> >> Mast out AT_STATX_SYNC_TYPE in vfs_getattr()
> >> And allow kernel internal request_flags in vfs_getattr_nosec()
> Hm, I thought that vfs_getattr_nosec needs to pass AT_GETATTR_NOSEC into
> ->getattr().
> >>
> >> The AT_ flag namespace is already a challenge, but mixing user
> >> flags and kernel-only flags in vfs interfaces has been done before.
> >>
> >> ...
>
>
> That's what I wanted to avoid since now all filesystems' getattr() may
> have the AT_GETATTR_NOSEC mixed into the query_flags.
>
> Anyway, here's what I currently have:
>
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 992d9c7e64ae..f7b5b1843dcc 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -998,16 +998,28 @@ static int ecryptfs_getattr_link(struct mnt_idmap
> *idmap,
> return rc;
> }
>
> +static int ecryptfs_do_getattr(bool nosec, const struct path *path,
> + struct kstat *stat, u32 request_mask,
> + unsigned int flags)
> +{
> + if (nosec)
> + return vfs_getattr_nosec(path, stat, request_mask, flags);
> + return vfs_getattr(path, stat, request_mask, flags);
> +}
> +
> static int ecryptfs_getattr(struct mnt_idmap *idmap,
> const struct path *path, struct kstat *stat,
> u32 request_mask, unsigned int flags)
> {
> struct dentry *dentry = path->dentry;
> struct kstat lower_stat;
> + bool nosec = flags & AT_GETATTR_NOSEC;
> int rc;
>
> - rc = vfs_getattr(ecryptfs_dentry_to_lower_path(dentry), &lower_stat,
> - request_mask, flags);
> + flags &= ~AT_INTERNAL_MASK;
> +
> + rc = ecryptfs_do_getattr(nosec,
> ecryptfs_dentry_to_lower_path(dentry),
> + &lower_stat, request_mask, flags);
> if (!rc) {
> fsstack_copy_attr_all(d_inode(dentry),
> ecryptfs_inode_to_lower(d_inode(dentry)));
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 83ef66644c21..ec4ceb5b4ebf 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -166,12 +166,15 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
> int fsid = 0;
> int err;
> bool metacopy_blocks = false;
> + bool nosec = flags & AT_GETATTR_NOSEC;
> +
> + flags &= ~AT_INTERNAL_MASK;
I don't understand why you need the nosec helper arg.
What's wrong with:
static int ovl_do_getattr(const struct path *path,
struct kstat *stat, u32 request_mask,
unsigned int flags)
{
if (flags & AT_GETATTR_NOSEC)
return vfs_getattr_nosec(path, stat, request_mask, flags);
return vfs_getattr(path, stat, request_mask, flags);
}
likewise for ecryptfs.
>
> metacopy_blocks = ovl_is_metacopy_dentry(dentry);
>
> type = ovl_path_real(dentry, &realpath);
> old_cred = ovl_override_creds(dentry->d_sb);
> - err = vfs_getattr(&realpath, stat, request_mask, flags);
> + err = ovl_do_getattr(nosec, &realpath, stat, request_mask, flags);
> if (err)
> goto out;
>
> @@ -196,8 +199,8 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
> (!is_dir ? STATX_NLINK : 0);
>
> ovl_path_lower(dentry, &realpath);
> - err = vfs_getattr(&realpath, &lowerstat,
> - lowermask, flags);
> + err = ovl_do_getattr(nosec, &realpath, &lowerstat,
> + lowermask, flags);
> if (err)
> goto out;
>
> @@ -249,8 +252,9 @@ int ovl_getattr(struct mnt_idmap *idmap, const
> struct path *path,
>
> ovl_path_lowerdata(dentry, &realpath);
> if (realpath.dentry) {
> - err = vfs_getattr(&realpath, &lowerdatastat,
> - lowermask, flags);
> + err = ovl_do_getattr(nosec, &realpath,
> + &lowerdatastat, lowermask,
> + flags);
> if (err)
> goto out;
> } else {
> diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
> index 9817b2dcb132..cbee3ff3bab7 100644
> --- a/fs/overlayfs/overlayfs.h
> +++ b/fs/overlayfs/overlayfs.h
> @@ -397,6 +397,15 @@ static inline bool ovl_open_flags_need_copy_up(int
> flags)
> return ((OPEN_FMODE(flags) & FMODE_WRITE) || (flags & O_TRUNC));
> }
>
> +static inline int ovl_do_getattr(bool nosec, const struct path *path,
> + struct kstat *stat, u32 request_mask,
> + unsigned int flags)
> +{
> + if (nosec)
> + return vfs_getattr_nosec(path, stat, request_mask, flags);
> + return vfs_getattr(path, stat, request_mask, flags);
> +}
> +
> /* util.c */
> int ovl_want_write(struct dentry *dentry);
> void ovl_drop_write(struct dentry *dentry);
> diff --git a/fs/stat.c b/fs/stat.c
> index d43a5cc1bfa4..3250e427e1aa 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -133,7 +133,8 @@ int vfs_getattr_nosec(const struct path *path,
> struct kstat *stat,
> idmap = mnt_idmap(path->mnt);)
> if (inode->i_op->getattr)
> return inode->i_op->getattr(idmap, path, stat,
> - request_mask, query_flags);
> + request_mask,
> + query_flags | AT_GETATTR_NOSEC);
>
You also need in vfs_getattr():
if (WARN_ON_ONCE(query_flags & AT_GETATTR_NOSEC)
return -EPERM;
> generic_fillattr(idmap, request_mask, inode, stat);
> return 0;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index b528f063e8ff..9069d6a301f0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2027,6 +2027,12 @@ struct super_operations {
> void (*shutdown)(struct super_block *sb);
> };
>
> +/*
> + * Internal query flags. See fcntl.h AT_xxx flags for the rest.
> + */
> +#define AT_GETATTR_NOSEC 0x80000000
> +#define AT_INTERNAL_MASK 0x80000000
> +
Yeh, the problem is that people adding flags to fcntl.h
won't be seeing this comment and we don't want to put those
"expose" those flags in uapi header either.
One possible compromise is to put them in fcntl.h under
#ifdef __KERNEL__
Very controversial, yes, I know.
The whole concept of mixing functional flags (i.e. AT_STATX_*)
with lookup AT_* flags is controversial to begin with, not to
mention flag overload for different syscalls (i.e. AT_EACCESS/
AT_REMOVEDIR/AT_HANDLE_FID).
But since we have accepted this necessary evil, I think that at least
we could explicitly partition the AT_ flags namespace and declare:
#ifdef __KERNEL__
AT_LOOKUP_FLAGS_MASK ...
AT_MOUNT_FLAGS_MASK AT_RECURSIVE
AT_SYSCALL_PRIVATE_MASK AT_EACCESS
AT_SYNC_TYPE_MASK AT_STATX_SYNC_TYPE
AT_KERNEL_INTERNAL_MASK 0x80000000
#endif
Sfefan,
I feel that I have to stress the point that this is only *my* opinion and
I accept that others (like some vfs co-maintains..) may passionately
disagree to further pollute the AT_ flags namespace.
The advantage of the AT_KERNEL_INTERNAL_MASK is that it is
in no way exposed to users via ABI, so if we decide to undo this
decision anytime in the future and reclaim the internal AT_ flags,
we could do that.
IMO, this is a decent compromise compared to the very noisy
patch that adds another flags argument to ->getattr() just to fix
this IMA/overlayfs corner case.
Thanks,
Amir.