[RFC v2 PATCH 4/8] VFS:userns: shift UID/GID to virtual view during permission access

From: Djalal Harouni
Date: Wed May 04 2016 - 10:29:54 EST


If both the mount namespace and the mount point support UID/GID shifts,
then before doing any permission check, translate inode->{i_uid|i_gid}
into the kernel virtual view, then use the result to do the permission
checks. If there is no support for UID/GID shifts, we fallback to
inode->{i_uid|i_gid} on-disk values.

The VFS will shift these values to the virtual view, the result
will be used to compare with current's fsuid and fsgid and to perform
classic or capable checks. Since inode->{i_uid|i_gid} will always
contain the on-disk values we do the virtual translation when an access
is needed.

This solves the problem of privileged userns or users inside containers
that want to access files, but the access fails since VFS uses their
global kuid/kgid.

Permission checks inside user_ns_X
----------------------------------

Without this Patch:
-------------------------------------------------------------------------
inode->uid on Disk | init_user_ns uid | user_ns_X uid | Access
-------------------------------------------------------------------------
0 | 1000000 | 0 (userns root) | Denied
-------------------------------------------------------------------------
999 | 1000999 | 999 | Denied
-------------------------------------------------------------------------
1000 | 1001000 | 1000 | Denied
-------------------------------------------------------------------------
1000 | 1000000 | 0 (userns root CAPS) | Denied
-------------------------------------------------------------------------
0 | 1001000 | 1000 | Denied
-------------------------------------------------------------------------

With this patch:
--------------------------------------------------------------------------
inode->uid on Disk | init_user_ns uid | user_ns_X uid | Access
--------------------------------------------------------------------------
0 | 1000000 | 0 (userns root) | Granted
--------------------------------------------------------------------------
999 | 1000999 | 999 | Granted
--------------------------------------------------------------------------
1000 | 1001000 | 1000 | Granted
--------------------------------------------------------------------------
1000 | 1000000 | 0 (userns root CAPS) | Granted
--------------------------------------------------------------------------
999 | 1000000 | 0 (userns root CAPS) | Granted
--------------------------------------------------------------------------
0 | 1001000 | 1000 | Denied
--------------------------------------------------------------------------
0 | 1000999 | 999 | Denied
--------------------------------------------------------------------------
1000 | 1000999 | 999 | Denied
--------------------------------------------------------------------------

* CAPS: means capabilities, the access was granted due to the capabilities
of the caller inside user_ns_X and the shifted UID/GID of the inode are
also mapped in that user_ns_X

Privileged root user namespaces with uid 0 inside the container will be
able to access inodes->i_uid == 0 on-disk if that inode is on a file
system that supports VFS UID/GID shifts and the caller is inside a mount
namespace that also supports the above.

Signed-off-by: Dongsu Park <dongsu@xxxxxxxxxxxx>
Signed-off-by: Djalal Harouni <tixxdz@xxxxxxxxxx>
---
fs/inode.c | 5 +++--
fs/namei.c | 6 ++++--
kernel/capability.c | 14 ++++++++++++--
3 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 69b8b52..07daf5f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1961,12 +1961,13 @@ EXPORT_SYMBOL(inode_init_owner);
bool inode_owner_or_capable(const struct inode *inode)
{
struct user_namespace *ns;
+ kuid_t i_uid = vfs_shift_i_uid_to_virtual(inode);

- if (uid_eq(current_fsuid(), inode->i_uid))
+ if (uid_eq(current_fsuid(), i_uid))
return true;

ns = current_user_ns();
- if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid))
+ if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, i_uid))
return true;
return false;
}
diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..f7ee498 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -289,8 +289,10 @@ static int check_acl(struct inode *inode, int mask)
static int acl_permission_check(struct inode *inode, int mask)
{
unsigned int mode = inode->i_mode;
+ kuid_t i_uid = vfs_shift_i_uid_to_virtual(inode);
+ kgid_t i_gid = vfs_shift_i_gid_to_virtual(inode);

- if (likely(uid_eq(current_fsuid(), inode->i_uid)))
+ if (likely(uid_eq(current_fsuid(), i_uid)))
mode >>= 6;
else {
if (IS_POSIXACL(inode) && (mode & S_IRWXG)) {
@@ -299,7 +301,7 @@ static int acl_permission_check(struct inode *inode, int mask)
return error;
}

- if (in_group_p(inode->i_gid))
+ if (in_group_p(i_gid))
mode >>= 3;
}

diff --git a/kernel/capability.c b/kernel/capability.c
index 45432b5..fdc8afb 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -441,9 +441,19 @@ EXPORT_SYMBOL(file_ns_capable);
*/
bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
{
+ kuid_t i_uid;
+ kgid_t i_gid;
struct user_namespace *ns = current_user_ns();

- return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
- kgid_has_mapping(ns, inode->i_gid);
+ /*
+ * Check if inode's UID/GID are mean to be shifted into the current
+ * mount namespace, if so we use the result to check if the shifted
+ * UID/GID have a mapping in current's user namespace.
+ */
+ i_uid = vfs_shift_i_uid_to_virtual(inode);
+ i_gid = vfs_shift_i_gid_to_virtual(inode);
+
+ return ns_capable(ns, cap) && kuid_has_mapping(ns, i_uid) &&
+ kgid_has_mapping(ns, i_gid);
}
EXPORT_SYMBOL(capable_wrt_inode_uidgid);
--
2.5.5