Re: [PATCH] nfsd: try nfsdcld client tracker in containers

From: J. Bruce Fields
Date: Mon Mar 04 2013 - 15:18:56 EST


On Mon, Mar 04, 2013 at 03:04:22PM -0500, Jeff Layton wrote:
> On Mon, 4 Mar 2013 21:56:19 +0400
> Stanislav Kinsbursky <skinsbursky@xxxxxxxxxxxxx> wrote:
>
> > 04.03.2013 18:47, Jeff Layton ÐÐÑÐÑ:
> > > On Mon, 4 Mar 2013 10:38:45 +0400
> > > Stanislav Kinsbursky <skinsbursky@xxxxxxxxxxxxx> wrote:
> > >> 3) UMH lookup and execute binary from current root. This problem just chasing all the containerisation work. So, either UHM logic have to updated (which is not
> > >> trivial or easy to implement and push upstream), or process root have to swapped to the right (container's) one. This, BTW, not that hard, because UMH call
> > >> accept "init" callback, which can be used to swap the root right before do_execve() is called.
> > >>
> > >> What do you, guys, think about all this?
> > >>
> > > You mean just change to use call_usermodehelper_fns() and pasn the
> > > correct namespace info? Yeah that looks like the easiest fix and sounds
> > > quite reasonable.
> > >
> >
> > Nope. The problem here is not a namespace, but root path:
> > mount point + dentry.
> > do_execve() uses root path from current for search for any string path.
> > And this root path is inherited from the kernel thread. Which means,
> > that this is global "init" root path.
> > Thus, if is we want to search for path in a container (which could
> > have it's own nested root), we have to swap the root in usermode
> > helper thread before do_execve() call.
> > We can use call_usermodehelper_fns() to pass init callback and swap root
> > in it.
> > But the problem here is that root swaping is not that... gentle.
> > I.e. we were trying to avoid it. For example, local SUNRPC transports
> > now connects synchronously (to make sure, that connection will be done in
> > proper root environment).
> > Nevertheless, I don't see any other way to containerize UMH so far.
> >
> > Bruce, what's your opinion about this?
> >
> >
>
> It doesn't seem all that awful here. Once we've spawned this new
> process it'll just run to completion and exit within the container. We
> shouldn't ever need to go back to the original root.
>
> In the sunrpc layer, it's a little more complicated since you're
> working with kernel threads and workqueues that may be shared between
> containers (right?).

Also, is this really the only place that needs this?

I guess NFS for now allows mounts only in the initial namespace, so the
idmap upcalls can always be done there. Module lookup and any usermode
helpers used for hardware setup probably also only happen in the initial
namespace. Hrm. Still, seems likely to be useful to someone
eventually.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/