Re: [PATCH -mm 5/7] add user namespace

From: Eric W. Biederman
Date: Sat Jul 15 2006 - 08:35:16 EST


Kyle Moffett <mrmacman_g4@xxxxxxx> writes:

> Here's a possible example:
>
> I have one disk which I want to share between multiple virtualized instances
> for root filesystems. I bind-mount /onedisk/foo as the foo virtual machine's
> root and /onedisk/bar as the bar virtual machine's root. There should (must)
> be two interpretations of the linear UID space on that disk, one for the foo
> virtual machine, and one for the bar virtual machine. By allowing the
> administrator to determine UID namespace per-vfsmount, you make such an
> arrangement possible where it otherwise would not be.

Yes.

With the scenario you describe there is a confusing case of how do
you interpret uids on the /onedisk mount. uid mapping may be a more
appropriate strategy to remove all confusion there.

> With NFS and the proposed superblock-sharing patches (necessary for efficiency
> and other reasons I don't entirely understand), the situation is worse: A
> mount of server:/foo/bar on / in the bar virtual machine may get its superblock
> merged with a mount of server:/ foo/baz on / in the baz virtual machine. If
> it's efficient to merge those superblocks we should, and once again it's
> necessary to tie the UID namespace to the vfsmount, not the
> superblock.

I completely agree that pushing nameidata down into generic_permission
where we can use per mount properties in our permission checks is
ideal. The benefit I see is just a small increase in flexibility.
So I don't really care either way.

Currently there are several additional flags that could benefit
from a per vfsmount interpretation as well: nosuid, noexec, nodev,
and readonly, how do we handle those?

noexec is on the vfsmount.
nosuid is on the vfsmount
nodev is on the vfsmount
readonly is not on the vfsmount.

The existing precedent is clearly in favor of putting this kind of
information on the vfsmount. The read-only attribute seems to
be the only hold out. If readonly has deep implications like
no journal replay it makes sense to keep it per mount. Which
indicates we could nose a nowrite option to express the per
vfsmount property.

I hope the confusion has passed for Trond. My impression was he
figured this was per process data so it didn't make sense any where
near a filesystem, and the superblock was the last place it should
be.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/