Re: [RFC][PATCH] sunrpc: fix oops in rpc_create() when the mount namespace is unshared

From: Chuck Lever
Date: Wed Sep 10 2008 - 16:57:28 EST


On Sep 10, 2008, at Sep 10, 2008, 4:02 PM, ebiederm@xxxxxxxxxxxx wrote:
"Chuck Lever" <chuck.lever@xxxxxxxxxx> writes:
That makes sense.

This is likely coming from lockd_down(), and is almost certainly not
coming from the same uts namespace as the lockd_up() that did the
pmap_set, which was done by the first NFS mount done in the first uts
namespace on the system. It's just something that the kernel has to
do for maintenance.

There is only one lockd() instance that is shared among all the uts
namespaces, right? In this case, what is the correct utsname to use?

Interesting.

As a general rule I would say we should capture the uts instance
in locked_up(). And use the same instance in locked_down().

I'm not at all familiar with how locked interacts with nfs mounts
in a practical sense. Is there one locked instance (or at least context)
per nfs mount?

The way I would expect things to work is that when we mount an nfs filesystem
from an nfs server. We would create a locked context for that server, that
additional nfs mounts to the same nfs server could share.

There is one lockd, one statd, and one rpcbind per client. These are shared between all the NFS mounts on the client. Likewise, there is one of each of these per server, and they are shared among all exports.

lockd_up() and lockd_down() maintain a count of mounts and exports, and lockd_down() shuts down lockd when the count goes to zero.

statd provides the ability to signal a server when a client reboots (and vice versa). This gives the server an indication of when to free locks for any applications on a rebooting client, and gives the client an indication of when it needs to reclaim locks on a rebooting server.

statd (user space) and lockd (kernel) have to share a cookie (mon_name) which is used to identify the client to servers, and the server to clients, so reboots can be detected. That cookie would probably need to be the initial utsname.

The way I would expect nfs to interact with the namespaces is for the nfs
mount to capture the uts and network namespaces, and use them for all
transactions relating to the mount.

That works for the main NFS protocol, perhaps, but the auxiliary protocols are another matter. They operate on behalf of a whole client or server, not on behalf of an individual mount or export.

In particular when creating
or a locked context the nfs mount would use the uts namespace and the
network namespace as discriminators to see if an existing locked context
is the same.

Possible, but I would expect this to be a lot of work for not much gain. The right answer is likely that you need a lockd and statd instance (virtual or real) for each namespace. The mounts and exports in each namespace would have their own lockd and statd.

I don't think nfs has a 1-1 thread to context model which is where things
get really hazy for me.

Users are assigned credentials. The credentials are passed from client to server, and the server does work on behalf of that credential (user). lockd uses a credential and a process identifier to find locks on files.

AUTH_SYS credentials (the lowest common denominator) are constructed from the user's UID and GID and the client's utsname.

The kernel, then, will have to construct unique credentials for users in each uts namespace. This is likely not an NFS mount-time issue, but is instead part of the mechanism of mapping requests from processes to RPC credentials.

The conservative play is to always force use of the initial namespace
and to deny creation of mounts that would use different namespaces. In part
because the initial version of the namespace always exists. Which means
as relates to Cedrics initial patch we would still need to know which
mounts should cause us to use a different uts namespace so we can deny
them.

OK. I think what you are saying is that NFS won't work outside of the initial uts namespace, for now?

Also, how would an automounter fit into this uts namespace scheme?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/