Re: overlayfs access checks on underlying layers

From: Paul Moore
Date: Mon Dec 03 2018 - 18:28:15 EST


On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@xxxxxxxxxx> wrote:
> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> >
> >> Possibly I misunderstood you, but I don't think we want to copy-up on
> >> permission denial, as that would still allow the mounter to read/write
> >> special files or execute regular files to which it would normally be
> >> denied access, because the copy would inherit the context specified by
> >> the mounter in the context mount case. It still represents an
> >> escalation of privilege for the mounter. In contrast, the copy-up on
> >> write behavior does not allow the mounter to do anything it could not do
> >> already (i.e. read from the lower, write to the upper).
> > Let's get this straight: when file is copied up, it inherits label
> > from context=, not from label of lower file?
>
> Yes, in the case of context mount, it will get the context mount directory.
>
> In the case of not context mount, it should maintain the label of the
> lower.
>
> > Next question: permission to change metadata is tied to permission to
> > open? Is it possible that open is denied, but metadata can be
> > changed?
>
> Yes, SElinux handles open differently then setattr. Although I am not
> sure if any tools handle this.
>
> > DAC model allows this: metadata change is tied to ownership, not mode
> > bits. And different capability flag.
> >
> > If the same is true for MAC, then the pre-v4.20-rc1 is already
> > susceptible to the privilege escalation you describe, right?
>
> After talking to Vivek, I am not sure their is a privilege escallation.

More on this below, but this thread doesn't have me convinced, and we
are at -rc5 right now. We need to come to some decision on this soon
because we are running out of time before v4.20 is released with this
code.

> For device nodes, the mounter has to have the ability to create the
> devicenode with the context mount, if he can do this, then he can do it
> with or without Overlay. This might lead to users making mistakes on
> security, but the model is sound. And I think this stands even in the
> case of the lower is mounted NODEV and the upper is not. If the mounter
> can create a device on the upper with a particular label, then he does
> not need the lower.

The problem I have when looking at the current code is that permission
is given, regardless of what is requested, for any special_file() on
an overlayfs mount.

It also looks like the mounter's creds are used when checking
permissions regardless of the file has been copied up or not; I would
expect that the mounter's permissions would only used when checking
permissions against the lower inode, no? I think there is also some
weird behavior if the underlying inode only allows the mounter to
write (no read) and a write is requested at the overlayfs layer. I'm
sure I'm missing some subtle thing with overlayfs, but why aren't we
doing something like the following:

int ovl_permission(...) {

if (!realinode) {
...
}

err = generic_permission(inode, mask);
if (err)
return err;

if (upperinode) {
err = inode_permission(upperinode, mask);
} else {
// on the lower inode, always use the mounter's creds
old_cred = ovl_override_creds(...);

// check to see if we have the right perms first, if
// that fails switch to a read/copy-up check if we
// are doing a write (note: we are not bypassing the
// exec check, the task can change the metadata like
// every other fs)
err = inode_permission(lowerinode, mask);
if (err && (mask & (MAY_EXEC | MAY_APPEND))) {
// PM: my guess is that we also need to add a
// "&& !special_file(lowerinode)" to the conditional
// above because you can't copy-up a dev node in the
// normal sense, but we'll leave that as a discussion
// point for now...
// turn the write into a read (copy-up)
mask &= ~(MAY_WRITE | MAY_APPEND);
mask |= MAY_READ;
err = inode_permission(lowerinode, mask);
}

// reset the creds
revert_creds(old_cred);
}

return err;
}

> For sockets, I see the case where a process is listening on the lower
> level socket, the mounter mounts the overlay over the directory with the
> socket. Then the mounter changes the attributes of the socket,
> performing a copy up. If the mounter can not talk to the socket and the
> other end is still listening, then this could be an issue. If the
> socket is no longer connected to the listener on the lower, then this is
> not an issue.
>
> Similar for a FIFO.

See my comment "// PM: my guess ..." in the pseudo code above. I
think the write->read permission mask conversion really should only
apply to normal files where you can do a copy-up.

> With SELinux we are also always checking not only the file access to the
> socker, but also checking whether the label of the client is able to
> talk to the label of the server daemon. So we are protected by a
> secondary check.

That's making some assumptions on the LSM and the LSM's loaded policy
and is not something I would want to rely on.

--
paul moore
www.paul-moore.com