Re: [PATCH 3/4] autofs - make mountpoint checks namespace aware
From: Ian Kent
Date: Sun Sep 18 2016 - 20:58:25 EST
On Fri, 2016-09-16 at 10:58 +0800, Ian Kent wrote:
> On Thu, 2016-09-15 at 19:47 -0500, Eric W. Biederman wrote:
> > Ian Kent <raven@xxxxxxxxxx> writes:
> >
> > > On Wed, 2016-09-14 at 21:08 -0500, Eric W. Biederman wrote:
> > > > Ian Kent <raven@xxxxxxxxxx> writes:
> > > >
> > > > > On Wed, 2016-09-14 at 12:28 -0500, Eric W. Biederman wrote:
> > > > > > Ian Kent <raven@xxxxxxxxxx> writes:
> > > > > >
> > > > > > > If an automount mount is clone(2)ed into a file system that is
> > > > > > > propagation private, when it later expires in the originating
> > > > > > > namespace subsequent calls to autofs ->d_automount() for that
> > > > > > > dentry in the original namespace will return ELOOP until the
> > > > > > > mount is manually umounted in the cloned namespace.
> > > > > > >
> > > > > > > In the same way, if an autofs mount is triggered by automount(8)
> > > > > > > running within a container the dentry will be seen as mounted in
> > > > > > > the root init namespace and calls to ->d_automount() in that
> > > > > > > namespace
> > > > > > > will return ELOOP until the mount is umounted within the
> > > > > > > container.
> > > > > > >
> > > > > > > Also, have_submounts() can return an incorect result when a mount
> > > > > > > exists in a namespace other than the one being checked.
> > > > > >
> > > > > > Overall this appears to be a fairly reasonable set of changes. It
> > > > > > does
> > > > > > increase the expense when an actual mount point is encountered, but
> > > > > > if
> > > > > > these are the desired some increase in cost when a dentry is a
> > > > > > mountpoint is unavoidable.
> > > > > >
> > > > > > May I ask the motiviation for this set of changes? Reading through
> > > > > > the
> > > > > > changes I don't grasp why we want to change the behavior of autofs.
> > > > > > What problem is being solved? What are the benefits?
> > > > >
> > > > > LOL, it's all too easy for me to give a patch description that I think
> > > > > explains
> > > > > a problem I need to solve without realizing it isn't clear to others
> > > > > what
> > > > > the
> > > > > problem is, sorry about that.
> > > > >
> > > > > For quite a while now, and not that frequently but consistently, I've
> > > > > been
> > > > > getting reports of people using autofs getting ELOOP errors and not
> > > > > being
> > > > > able
> > > > > to mount automounts.
> > > > >
> > > > > This has been due to the cloning of autofs file systems (that have
> > > > > active
> > > > > automounts at the time of the clone) by other systems.
> > > > >
> > > > > An unshare, as one example, can easily result in the cloning of an
> > > > > autofs
> > > > > file
> > > > > system that has active mounts which shows this problem.
> > > > >
> > > > > Once an active mount that has been cloned is expired in the namespace
> > > > > that
> > > > > performed the unshare it can't be (auto)mounted again in the the
> > > > > originating
> > > > > namespace because the mounted check in the autofs module will think it
> > > > > is
> > > > > already mounted.
> > > > >
> > > > > I'm not sure this is a clear description either, hopefully it is
> > > > > enough
> > > > > to
> > > > > demonstrate the type of problem I'm typing to solve.
> > > >
> > > > So to rephrase the problem is that an autofs instance can stop working
> > > > properly from the perspective of the mount namespace it is mounted in
> > > > if the autofs instance is shared between multiple mount namespaces. The
> > > > problem is that mounts and unmounts do not always propogate between
> > > > mount namespaces. This lack of symmetric mount/unmount behavior
> > > > leads to mountpoints that become unusable.
> > >
> > > That's right.
> > >
> > > It's also worth considering that symmetric mount propagation is usually
> > > not
> > > the
> > > behaviour needed either and things like LXC and Docker are set propagation
> > > slave
> > > because of problems caused by propagation back to the parent namespace.
> > >
> > > So a mount can be triggered within a container, mounted by the automount
> > > daemon
> > > in the parent namespace, and propagated to the child and similarly for
> > > expires,
> > > which is the common use case now.
> > >
> > > >
> > > > Which leads to the question what is the expected new behavior with your
> > > > patchset applied. New mounts can be added in the parent mount namespace
> > > > (because the test is local). Does your change also allow the
> > > > autofs mountpoints to be used in the other mount namespaces that share
> > > > the autofs instance if everything becomes unmounted?
> > >
> > > The problem occurs when the subordinate namespace doesn't deal with these
> > > propagated mounts properly, although they can obviously be used by the
> > > subordinate namespace.
> > >
> > > >
> > > > Or is it expected that other mount namespaces that share an autofs
> > > > instance will get changes in their mounts via mount propagation and if
> > > > mount propagation is insufficient they are on their own.
> > >
> > > Namespaces that receive updates via mount propagation from a parent will
> > > continue to function as they do now.
> > >
> > > Mounts that don't get updates via mount propagation will retain the mount
> > > to
> > > use
> > > if they need to, as they would without this change, but the originating
> > > namespace will also continue to function as expected.
> > >
> > > The child namespace needs cleanup its mounts on exit, which it had to do
> > > prior
> > > to this change also.
> > >
> > > >
> > > > I believe this is a question of how do notifications of the desire for
> > > > an automount work after your change, and are those notifications
> > > > consistent with your desired and/or expected behavior.
> > >
> > > It sounds like you might be assuming the service receiving these cloned
> > > mounts
> > > actually wants to use them or is expecting them to behave like automount
> > > mounts.
> > > But that's not what I've seen and is not the way these cloned mounts
> > > behave
> > > without the change.
> > >
> > > However, as has probably occurred to you by now, there is a semantic
> > > change
> > > with
> > > this for namespaces that don't receive mount propogation.
> > >
> > > If a mount request is triggered by an access in the subordinate namespace
> > > for a
> > > dentry that is already mounted in the parent namespace it will silently
> > > fail
> > > (in
> > > that a mount won't appear in the subordinate namespace) rather than
> > > getting
> > > an
> > > ELOOP error as it would now.
> > >
> > > It's also the case that, if such a mount isn't already mounted, it will
> > > cause a
> > > mount to occur in the parent namespace. But that is also the way it is
> > > without
> > > the change.
> > >
> > > TBH I don't know yet how to resolve that, ideally the cloned mounts would
> > > not
> > > appear in the subordinate namespace upon creation but that's also not
> > > currently
> > > possible to do and even if it was it would mean quite a change in to the
> > > way
> > > things behave now.
> > >
> > > All in all I believe the change here solves a problem that needs to be
> > > solved
> > > without affecting normal usage at the expense of a small behaviour change
> > > to
> > > cases where automount isn't providing a mounting service.
> >
> > That sounds like a reasonable semantic change. Limiting the responses
> > of the autofs mount path to what is present in the mount namespace
> > of the program that actually performs the autofs mounts seems needed.
>
> Indeed, yes.
>
> >
> > In fact the entire local mount concept exists because I was solving a
> > very similar problem for rename, unlink and rmdir. Where a cloned mount
> > namespace could cause a denial of service attack on the original
> > mount namespace.
> >
> > I don't know if this change makes sense for mount expiry.
>
> Originally I thought it did but now I think your right, it won't actually make
> a
> difference.
>
> Let me think a little more about it, I thought there was a reason I included
> the
> expire in the changes but I can't remember now.
>
> It may be that originally I thought individual automount(8) instances within
> containers could be affected by an instance of automount(8) in the root
> namespace (and visa versa) but now I think these will all be isolated.
I also thought that the autofs expire would continue to see the umounted mount
and continue calling back to the daemon in an attempt to umount it.
That isn't the case so I can drop the changes to the expire expire code as you
recommend.
Ian