Re: [PATCH -mm 5/7] add user namespace
From: Kyle Moffett
Date: Sat Jul 15 2006 - 09:29:12 EST
On Jul 15, 2006, at 08:35:18, Eric W. Biederman wrote:
Kyle Moffett <mrmacman_g4@xxxxxxx> writes:
With NFS and the proposed superblock-sharing patches (necessary
for efficiency and other reasons I don't entirely understand),
the situation is worse: A mount of server:/foo/bar on / in the
bar virtual machine may get its superblock merged with a mount of
server:/ foo/baz on / in the baz virtual machine. If it's
efficient to merge those superblocks we should, and once again
it's necessary to tie the UID namespace to the vfsmount, not the
superblock.
I completely agree that pushing nameidata down into
generic_permission where we can use per mount properties in our
permission checks is ideal. The benefit I see is just a small
increase in flexibility. So I don't really care either way.
Currently there are several additional flags that could benefit
from a per vfsmount interpretation as well: nosuid, noexec, nodev,
and readonly, how do we handle those?
noexec is on the vfsmount.
nosuid is on the vfsmount
nodev is on the vfsmount
readonly is not on the vfsmount.
The existing precedent is clearly in favor of putting this kind of
information on the vfsmount. The read-only attribute seems to be
the only hold out. If readonly has deep implications like no
journal replay it makes sense to keep it per mount. Which
indicates we could nose a nowrite option to express the per
vfsmount property.
Well, speaking of that; there's been another thread recently that's
splitting the properties of read-only between vfsmount and
superblock. So a read-only superblock implies read-only vfsmounts,
but the following can create a read-only vfsmount for a writable
superblock:
mount --bind / /mnt/read-only-root
mount -o ro,remount /mnt/read-only-root
So the readonly special case will go away.
I hope the confusion has passed for Trond. My impression was he
figured this was per process data so it didn't make sense any where
near a filesystem, and the superblock was the last place it should be.
One of the things I said earlier in this thread is that "Both
filesystems _and_ processes should be first-class objects in any UID
namespace". In order to have sufficient access controls in the
presence of _only_ a UID-namespace (as opposed to with full container
isolation), you need to check against an object *and* the namespace
in which it is located. In some cases, the object is a file, which
means that the inode, vfsmount, or superblock need a UID namespace
reference. Theoretically a you could implement per-file UID
namespace pointers, but that would probably be incredibly
inefficient. IMHO, per-vfsmount gives the best flexibility and
efficiency of the three.
In fact, it's strange to think about this in context with the rest of
the namespaces that are being designed, but processes would
ordinarily *not* have primary presence in a UID namespace if they
weren't the target of UID-verified operations in and of themselves
(EX: kill, ptrace, etc). Otherwise they would just have a set of
(namespace,UID,cap_flags) pairs to give them access to filesystems in
specific uid namespaces.
Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/