Re: acl_permission_check: disgusting performance

From: Eric W. Biederman
Date: Thu May 12 2011 - 23:52:51 EST


"Serge E. Hallyn" <serge@xxxxxxxxxx> writes:

> Quoting Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx):
>> Those four instructions are about two thirds of the cost of the
>> function. The last two are about 50% of the cost.
>>
>> They are the accesses to "current", "->cred", "->user" and "->user_ns"
>> respectively (the cmp with the big constant is that compare against
>> "init_ns").
>>
>> Now, if we got rid of them, we wouldn't improve performance by 2/3rds
>> on that function, because we do need the two first accesses for
>> "fsuid" (which is the next check), and the third one (which is
>> currently "cred->user" ends up doing the cache miss that we'd take for
>> "cred->fsuid" anyway. So the first three costs are fairly inescapable.
>>
>> They are also cheaper, probably because those fields tend to be more
>> often in the cache. So it really is that fourth one that hurts the
>> most, as shown by it taking almost a third of the cycles of that
>> function.
>>
>> And it all comes from that annoying commit e795b71799ff0 ("userns:
>> userns: check user namespace for task->file uid equivalence checks"),
>> and I bet nobody involved thought about how expensive that was.
>>
>> That "user_ns" is _really_ expensive to load. And the fact that it's
>> after a chain of three other loads makes it all totally serialized,
>> and makes things much more expensive.
>>
>> Could we perhaps have "user_ns" directly in the "struct cred"? Or
>
> The only reason not to put it into struct cred would be to avoid growing
> the struct cred. For that matter, esp since you can't unshare the user_ns,
> it could also go right into the task_struct.
>
> (Eric's sys_setns patchset will eventually complicate that, but I don't
> think it'll be a problem)

>From the perspective of a process the user namespace and the pid
namespace will never change. I expect we will have something that lets
you change the user namespace and the pid namespace experienced by child
processes. So the sys_setns work should not affect this.

>> could we avoid or short-circuit this check entirely somehow, since it
>> always checks against "init_ns"?
>
> Of course I'm hoping that before fall the check won't be against
> init_ns any more :) I was actually hoping to get back to that next
> week, so I can start by testing the caching you suggest.

Linus brings up a good point that we need to be very careful with
the user namespace and performance. That said I think there is
a cheap trick we can do until the user namespace is actually
good for something.

Something like my untested patch below.

Perhaps current_user_ns needs to move into user_namespace.h to get this
to compile. There are some weird circular header dependencies in there.

In any event an inline version of current_user_ns that returns
init_user_ns in the case where user namespaces aren't compiled in should
fix the immediate performance problems by allowing the compiler to
optimize them out.

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 9aeeb0b..09c76c2 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -357,7 +357,17 @@ static inline void put_cred(const struct cred *_cred)
#define _current_user_ns() (current_cred_xxx(user)->user_ns)
#define current_security() (current_cred_xxx(security))

+#if CONFIG_USER_NS
extern struct user_namespace *current_user_ns(void);
+#else
+struct user_namespace;
+extern struct user_namespace init_user_ns;
+static inline struct user_namespace *current_user_ns(void)
+{
+
+ return &init_user_ns;
+}
+#endif

#define current_uid_gid(_uid, _gid) \
do { \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/