Re: Read-only `slaves` with shared subtrees?

From: Eric W. Biederman
Date: Wed Sep 20 2017 - 19:07:14 EST


ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:

> Ram Pai <linuxram@xxxxxxxxxx> writes:
>
>> sorry forgot to copy Eric.
>
> Adding fs-devel as well.
>
>> On Wed, Sep 20, 2017 at 12:39:54PM -0700, Ram Pai wrote:
>>> On Tue, Sep 19, 2017 at 04:18:02PM -0700, Dawid Ciezarkiewicz wrote:
>>> > On Mon, Sep 18, 2017 at 1:47 PM, Ram Pai <linuxram@xxxxxxxxxx> wrote:
>>> > > It is possible to make a slave mount readonly, by remounting it with
>>> > > 'ro' flags.
>>> > >
>>> > > something like
>>> > >
>>> > > mount -o bind,remount,ro <slave-mount-dir>
>>> > >
>>> > > Any mount-propagation events reaching a read-only-slave does
>>> > > inherit the slave attribute. However it does not inherit the
>>> > > read-only attribute.
>>> >
>>> > I did try manually remounting, and it worked for me. If this could be
>>> > done atomically
>>> > (which I assume can't be, in the userspace) it could even be a workaround.
>>> >
>>> > > Should it inherit? or should it not? -- that has not been thought
>>> > > off AFAICT. it think we should let it inherit.
>>> >
>>> > It makes sense, and it would work in my use-case. I wonder
>>> > if that would break any existing expectations though.
>>>
>>> It could break existing expectations, for mounts created by propagation.
>>> This needs to be thought through. Also Should the same semantics
>>> apply to MNT_NOSUID, MNT_NOEXEC etc etc?
>>>
>>> Copying Eric. he should be able to tell if any of the container
>>> infrastructure assumes anything about mounts propagated to read-only
>>> mounts.
>
> *Blink*
>
> Let me reiterate what I think I am seeing. The properties of a
> propogated mount taking on attributes from the propagation node, where
> the mount is propagated too.
>
> I honestly can't say if any code cares today, but this feels like it
> will break the principle of least surprise and break someone.

Thinking about this a little I am almost certain this will break
something.

A common pattern for containers is to have a read-only shared portion
typically the rootfs and then other mounts that are read-write. If all
of your propagation nodes hang off of a big read-only mount (and
therefore need to be read-only) forcing everything else to propagate
into the container as read-only is likely going to break something.

> We can safely add this extension by adding a new flag or flags that can
> be set on a pnode that will give the desired semantics. So I expect
> that is a better model then adding new semantics to MNT_RDONLY.

Which means I think to do this safely we really do need to add a new
flag.

Eric


>>> > I could at least test such a patch, it seems like a tiny change.
>>> > Should I give it a try and submit a patch? If you could PM me any pointers
>>> > it could help a lot since I'm not familiar with FS internals. So far I got here:
>>>
>>> Here is a rough patch which will accomplish what you want; not
>>> compile-tested nor tested.
>>>
>>>
>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>> index f8893dc..3239adc 100644
>>> --- a/fs/namespace.c
>>> +++ b/fs/namespace.c
>>> @@ -1061,6 +1061,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
>>> list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
>>> unlock_mount_hash();
>>>
>>> + if (flag & CL_READONLY)
>>> + mnt->mnt.mnt_flags |= MNT_READONLY;
>>> +
>>> if ((flag & CL_SLAVE) ||
>>> ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
>>> list_add(&mnt->mnt_slave, &old->mnt_slave_list);
>>> diff --git a/fs/pnode.c b/fs/pnode.c
>>> index 53d411a..aeb5b47 100644
>>> --- a/fs/pnode.c
>>> +++ b/fs/pnode.c
>>> @@ -262,6 +262,8 @@ static int propagate_one(struct mount *m)
>>> /* Notice when we are propagating across user namespaces */
>>> if (m->mnt_ns->user_ns != user_ns)
>>> type |= CL_UNPRIVILEGED;
>>> + if (m->mnt.mnt_flags & MNT_READONLY)
>>> + type |= CL_READONLY;
>>> child = copy_tree(last_source, last_source->mnt.mnt_root, type);
>>> if (IS_ERR(child))
>>> return PTR_ERR(child);
>>> diff --git a/fs/pnode.h b/fs/pnode.h
>>> index dc87e65..7c59469 100644
>>> --- a/fs/pnode.h
>>> +++ b/fs/pnode.h
>>> @@ -29,6 +29,7 @@
>>> #define CL_SHARED_TO_SLAVE 0x20
>>> #define CL_UNPRIVILEGED 0x40
>>> #define CL_COPY_MNT_NS_FILE 0x80
>>> +#define CL_READONLY 0x100
>>>
>>> #define CL_COPY_ALL (CL_COPY_UNBINDABLE | CL_COPY_MNT_NS_FILE)
>>>
>>> RP