Re: [PATCH resend] fs/namei.c: micro-optimize acl_permission_check

From: Linus Torvalds
Date: Sun Jun 07 2020 - 15:49:28 EST


On Sun, Jun 7, 2020 at 9:37 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > That will kinda work, except you do that mask &= MAY_RWX before
> > check_acl(), which cares about MAY_NOT_BLOCK and who knows what other bits.
>
> Good catch.

With the change to not clear the non-rwx bits in general, the owner
case now wants to do the masking, and then the "shift left by 6"
modification makes no sense since it only makes for a bigger constant
(the only reason to do the shift-right was so that the bitwise not of
the i_mode could be done in parallel with the shift, but with the
masking that instruction scheduling optimization becomes kind of
immaterial too). So I modified that patch to not bother, and add a
comment about MAY_NOT_BLOCK.

And since I was looking at the MAY_NOT_BLOCK logic, it was not only
not mentioned in comments, it also had some really confusing code
around it.

The posix_acl_permission() looked like it tried to conserve that bit,
which is completely wrong. It wasn't a bug only for the simple reason
that the only two call-sites had either explicitly cleared the bit
when calling, or had tested that the bit wasn't set in the first
place.

So as a result, I wrote a second patch to clear that confusion up.

Rasmus, say the word and I'll mark you for authorship on the first one.

Comments? Can you find something else wrong here, or some other fixup to do?

Al, any reaction?

Linus
From 09cc0faef0766da8ff8e6a82cfc5c8c53a01d0a7 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Fri, 5 Jun 2020 13:40:45 -0700
Subject: [PATCH 1/2] vfs: do not do group lookup when not necessary

Rasmus Villemoes points out that the 'in_group_p()' tests can be a
noticeable expense, and often completely unnecessary. A common
situation is that the 'group' bits are the same as the 'other' bits
wrt the permissions we want to test.

So rewrite 'acl_permission_check()' to not bother checking for group
ownership when the permission check doesn't care.

For example, if we're asking for read permissions, and both 'group' and
'other' allow reading, there's really no reason to check if we're part
of the group or not: either way, we'll allow it.

Rasmus says:
"On a bog-standard Ubuntu 20.04 install, a workload consisting of
compiling lots of userspace programs (i.e., calling lots of
short-lived programs that all need to get their shared libs mapped in,
and the compilers poking around looking for system headers - lots of
/usr/lib, /usr/bin, /usr/include/ accesses) puts in_group_p around
0.1% according to perf top.

System-installed files are almost always 0755 (directories and
binaries) or 0644, so in most cases, we can avoid the binary search
and the cost of pulling the cred->groups array and in_group_p() .text
into the cpu cache"

Reported-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
fs/namei.c | 44 +++++++++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index d81f73ff1a8b..e74a7849e9b5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -288,37 +288,51 @@ static int check_acl(struct inode *inode, int mask)
}

/*
- * This does the basic permission checking
+ * This does the basic UNIX permission checking.
+ *
+ * Note that the POSIX ACL check cares about the MAY_NOT_BLOCK bit,
+ * for RCU walking.
*/
static int acl_permission_check(struct inode *inode, int mask)
{
unsigned int mode = inode->i_mode;

- if (likely(uid_eq(current_fsuid(), inode->i_uid)))
+ /* Are we the owner? If so, ACL's don't matter */
+ if (likely(uid_eq(current_fsuid(), inode->i_uid))) {
+ mask &= 7;
mode >>= 6;
- else {
- if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
- int error = check_acl(inode, mask);
- if (error != -EAGAIN)
- return error;
- }
+ return (mask & ~mode) ? -EACCES : 0;
+ }

- if (in_group_p(inode->i_gid))
- mode >>= 3;
+ /* Do we have ACL's? */
+ if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
+ int error = check_acl(inode, mask);
+ if (error != -EAGAIN)
+ return error;
}

+ /* Only RWX matters for group/other mode bits */
+ mask &= 7;
+
/*
- * If the DACs are ok we don't need any capability check.
+ * Are the group permissions different from
+ * the other permissions in the bits we care
+ * about? Need to check group ownership if so.
*/
- if ((mask & ~mode & (MAY_READ | MAY_WRITE | MAY_EXEC)) == 0)
- return 0;
- return -EACCES;
+ if (mask & (mode ^ (mode >> 3))) {
+ if (in_group_p(inode->i_gid))
+ mode >>= 3;
+ }
+
+ /* Bits in 'mode' clear that we require? */
+ return (mask & ~mode) ? -EACCES : 0;
}

/**
* generic_permission - check for access rights on a Posix-like filesystem
* @inode: inode to check access rights for
- * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC, ...)
+ * @mask: right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC,
+ * %MAY_NOT_BLOCK ...)
*
* Used to check for read/write/execute permissions on a file.
* We use "fsuid" for this, letting us set arbitrary permissions
--
2.27.0.rc1.8.g04bd0c80d7

From ead0cfe68e04c167034405a785878058ceb6d589 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 7 Jun 2020 12:19:06 -0700
Subject: [PATCH 2/2] vfs: clean up posix_acl_permission() logic aroudn
MAY_NOT_BLOCK

posix_acl_permission() does not care about MAY_NOT_BLOCK, and in fact
the permission logic internally must not check that bit (it's only for
upper layers to decide whether they can block to do IO to look up the
acl information or not).

But the way the code was written, it _looked_ like it cared, since the
function explicitly did not mask that bit off.

But it has exactly two callers: one for when that bit is set, which
first clears the bit before calling posix_acl_permission(), and the
other call site when that bit was clear.

So stop the silly games "saving" the MAY_NOT_BLOCK bit that must not be
used for the actual permission test, and that currently is pointlessly
cleared by the callers when the function itself should just not care.

Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
fs/namei.c | 2 +-
fs/posix_acl.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e74a7849e9b5..72d4219c93ac 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -271,7 +271,7 @@ static int check_acl(struct inode *inode, int mask)
/* no ->get_acl() calls in RCU mode... */
if (is_uncached_acl(acl))
return -ECHILD;
- return posix_acl_permission(inode, acl, mask & ~MAY_NOT_BLOCK);
+ return posix_acl_permission(inode, acl, mask);
}

acl = get_acl(inode, ACL_TYPE_ACCESS);
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index 249672bf54fe..95882b3f5f62 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -350,7 +350,7 @@ posix_acl_permission(struct inode *inode, const struct posix_acl *acl, int want)
const struct posix_acl_entry *pa, *pe, *mask_obj;
int found = 0;

- want &= MAY_READ | MAY_WRITE | MAY_EXEC | MAY_NOT_BLOCK;
+ want &= MAY_READ | MAY_WRITE | MAY_EXEC;

FOREACH_ACL_ENTRY(pa, acl, pe) {
switch(pa->e_tag) {
--
2.27.0.rc1.8.g04bd0c80d7