Re: [PATCH v3 1/3] fs: speed up path lookup with cheaper handling of MAY_EXEC

From: Christian Brauner
Date: Tue Nov 11 2025 - 04:46:11 EST


On Fri, Nov 07, 2025 at 03:21:47PM +0100, Mateusz Guzik wrote:
> The generic inode_permission() routine does work which is known to be of
> no significance for lookup. There are checks for MAY_WRITE, while the
> requested permission is MAY_EXEC. Additionally devcgroup_inode_permission()
> is called to check for devices, but it is an invariant the inode is a
> directory.
>
> Absent a ->permission func, execution lands in generic_permission()
> which checks upfront if the requested permission is granted for
> everyone.
>
> We can elide the branches which are guaranteed to be false and cut
> straight to the check if everyone happens to be allowed MAY_EXEC on the
> inode (which holds true most of the time).
>
> Moreover, filesystems which provide their own ->permission routine can
> take advantage of the optimization by setting the IOP_FASTPERM_MAY_EXEC
> flag on their inodes, which they can legitimately do if their MAY_EXEC
> handling matches generic_permission().
>
> As a simple benchmark, as part of compilation gcc issues access(2) on
> numerous long paths, for example /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o
>
> Issuing access(2) on it in a loop on ext4 on Sapphire Rapids (ops/s):
> before: 3797556
> after: 3987789 (+5%)
>
> Note: this depends on the not-yet-landed ext4 patch to mark inodes with
> cache_no_acl()
>
> Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
> ---
> fs/namei.c | 43 +++++++++++++++++++++++++++++++++++++++++--
> include/linux/fs.h | 13 +++++++------
> 2 files changed, 48 insertions(+), 8 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index a9f9d0453425..6b2a5a5478e7 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -540,6 +540,9 @@ static inline int do_inode_permission(struct mnt_idmap *idmap,
> * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
> *
> * Separate out file-system wide checks from inode-specific permission checks.
> + *
> + * Note: lookup_inode_permission_may_exec() does not call here. If you add
> + * MAY_EXEC checks, adjust it.
> */
> static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
> {
> @@ -602,6 +605,42 @@ int inode_permission(struct mnt_idmap *idmap,
> }
> EXPORT_SYMBOL(inode_permission);
>
> +/**
> + * lookup_inode_permission_may_exec - Check traversal right for given inode
> + *
> + * This is a special case routine for may_lookup() making assumptions specific
> + * to path traversal. Use inode_permission() if you are doing something else.
> + *
> + * Work is shaved off compared to inode_permission() as follows:
> + * - we know for a fact there is no MAY_WRITE to worry about
> + * - it is an invariant the inode is a directory
> + *
> + * Since majority of real-world traversal happens on inodes which grant it for
> + * everyone, we check it upfront and only resort to more expensive work if it
> + * fails.
> + *
> + * Filesystems which have their own ->permission hook and consequently miss out
> + * on IOP_FASTPERM can still get the optimization if they set IOP_FASTPERM_MAY_EXEC
> + * on their directory inodes.
> + */
> +static __always_inline int lookup_inode_permission_may_exec(struct mnt_idmap *idmap,
> + struct inode *inode, int mask)
> +{
> + /* Lookup already checked this to return -ENOTDIR */
> + VFS_BUG_ON_INODE(!S_ISDIR(inode->i_mode), inode);
> + VFS_BUG_ON((mask & ~MAY_NOT_BLOCK) != 0);
> +
> + mask |= MAY_EXEC;
> +
> + if (unlikely(!(inode->i_opflags & (IOP_FASTPERM | IOP_FASTPERM_MAY_EXEC))))
> + return inode_permission(idmap, inode, mask);
> +
> + if (unlikely(((inode->i_mode & 0111) != 0111) || !no_acl_inode(inode)))

Can you send a follow-up where 0111 is a constant with some descriptive
name, please? Can be local to the file. I hate these raw-coded
permission masks with a passion.

> + return inode_permission(idmap, inode, mask);
> +
> + return security_inode_permission(inode, mask);
> +}
> +
> /**
> * path_get - get a reference to a path
> * @path: path to get the reference to
> @@ -1855,7 +1894,7 @@ static inline int may_lookup(struct mnt_idmap *idmap,
> int err, mask;
>
> mask = nd->flags & LOOKUP_RCU ? MAY_NOT_BLOCK : 0;
> - err = inode_permission(idmap, nd->inode, mask | MAY_EXEC);
> + err = lookup_inode_permission_may_exec(idmap, nd->inode, mask);
> if (likely(!err))
> return 0;
>
> @@ -1870,7 +1909,7 @@ static inline int may_lookup(struct mnt_idmap *idmap,
> if (err != -ECHILD) // hard error
> return err;
>
> - return inode_permission(idmap, nd->inode, MAY_EXEC);
> + return lookup_inode_permission_may_exec(idmap, nd->inode, 0);
> }
>
> static int reserve_stack(struct nameidata *nd, struct path *link)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 03e450dd5211..7d5de647ac7b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -647,13 +647,14 @@ is_uncached_acl(struct posix_acl *acl)
> return (long)acl & 1;
> }
>
> -#define IOP_FASTPERM 0x0001
> -#define IOP_LOOKUP 0x0002
> -#define IOP_NOFOLLOW 0x0004
> -#define IOP_XATTR 0x0008
> +#define IOP_FASTPERM 0x0001
> +#define IOP_LOOKUP 0x0002
> +#define IOP_NOFOLLOW 0x0004
> +#define IOP_XATTR 0x0008
> #define IOP_DEFAULT_READLINK 0x0010
> -#define IOP_MGTIME 0x0020
> -#define IOP_CACHED_LINK 0x0040
> +#define IOP_MGTIME 0x0020
> +#define IOP_CACHED_LINK 0x0040
> +#define IOP_FASTPERM_MAY_EXEC 0x0080
>
> /*
> * Inode state bits. Protected by inode->i_lock
> --
> 2.48.1
>