Re: [RFC PATCH 5/8] KEYS: exec request-key within the requesting task's init namespace
From: Ian Kent
Date: Fri Feb 20 2015 - 22:59:19 EST
On Fri, 2015-02-20 at 14:05 -0500, J. Bruce Fields wrote:
> On Fri, Feb 20, 2015 at 12:07:15PM -0600, Eric W. Biederman wrote:
> > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:
> >
> > > On Fri, Feb 20, 2015 at 05:33:25PM +0800, Ian Kent wrote:
> >
> > >> The case of nfsd state-recovery might be similar but you'll need to help
> > >> me out a bit with that too.
> > >
> > > Each network namespace can have its own virtual nfs server. Servers can
> > > be started and stopped independently per network namespace. We decide
> > > which server should handle an incoming rpc by looking at the network
> > > namespace associated with the socket that it arrived over.
> > >
> > > A server is started by the rpc.nfsd command writing a value into a magic
> > > file somewhere.
> >
> > nit. Unless I am completely turned around that file is on the nfsd
> > filesystem, that lives in fs/nfsd/nfs.c.
> >
> > So I bevelive this really is a case of figuring out what we want the
> > semantics to be for mount and propogating the information down from
> > mount to where we call the user mode helpers.
>
> Oops, I agree. So when I said:
>
> The upcalls need to happen consistently in one context for a
> given virtual nfs server, and that context should probably be
> derived from rpc.nfsd's somehow.
>
> Instead of "rpc.nfsd's", I think I should have said "the mounter of
> the nfsd filesystem".
>
> Which is already how we choose a net namespace: nfsd_mount and
> nfsd_fill_super store the current net namespace in s_fs_info. (And then
> grep for "netns" to see the places where that's used.)
This is going to be mostly a restatement of what's already been said,
partly for me to refer back to later and partly to clarify and confirm
what I need to do, so prepare to be bored.
As a result of Oleg's recommendations and comments, the next version of
the series will take a reference to an nsproxy and a user namespace
(from the init process of the calling task, while it's still a child of
that task), it won't carry around task structs. There are still a couple
of questions with this so it's not quite there yet.
We'll have to wait and see if what I've done is enough to remedy Oleg's
concerns too. LOL, and then there's the question of how much I'll need
to do to get it to actually work.
The other difference is obtaining the context (now nsproxy and user
namspace) has been taken entirely within the usermode helper. I think
that's a good thing from the calling process isolation requirement. That
may need to change again based on the discussion here.
Now we're starting to look at actual usage it's worth keeping in mind
that how to execute within required namespaces has to be sound before we
tackle use cases that have requirements over this fundamental
functionality.
There are a couple of things to think about.
One thing that's needed is how to work out if the UMH_USE_NS is needed
and another is how to provide provide persistent usage of particular
namespaces across containers. The later probably will relate to the
origin of the file system (which looks like it will be identified at
mount time).
The first case is when the mount originates in the root init namespace
and most of the time (if not all the time) the UMH_USE_NS doesn't need
to be set and the helper should run in the root init namspace. That
should work for mount propagation as well with mounts bound into a
container.
Is this also true for automounted mounts at mount point crossing? Or
perhaps I should ask, should automounted NFS mounts inherit the property
from their parent mount?
The second case is when the mount originates in another namespace,
possibly a container. TBH I haven't thought too much about mounts that
originate from namespaces created by unshare(1) or other source yet. I'm
hoping that will just work once this is done, ;)
The last time I tried binding NFS mounts from one container into another
it didn't work, but if we assume that will work at some point then, as
Bruce points out, we need to provide the ability to record the
namespaces to be used for subsequent "in namespace" execution while
maintaining caller isolation (ie. derived from the callers init
process).
I've been aware of the need for persistence for a while now and I've
been thinking about how to do it but I don't have a clear plan quite
yet. Bruce, having noticed this, has described details about the
environment I have to work with so that's a start. I need the thoughts
of others on this too.
As a result I'm not sure yet if this persistence can be integrated into
the current implementation or if additional calls will be needed to set
and clear the namespace information while maintaining the needed
isolation.
As Bruce says, perhaps the namespace information should be saved as
properties of a mount or perhaps it should be a list keyed by some
handle, the handle being the saved property. I'm not sure yet but the
later might be unnecessary complication and overhead. The cleanup of the
namespace information upon summary termination of processes could be a
bit difficult, but perhaps it will be as simple as making it a function
of freeing of the object it's stored in (in the cases we have so far
that would be the mount).
So, yes, I've still got a fair way to go yet, ;)
Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/