Re: Building a BSD-jail clone out of namespaces

From: Chris Webb
Date: Thu Jun 06 2013 - 17:51:59 EST

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> writes:

> Hmm. I guess it depends on how your VM is reading them. If it is
> blocked based access to the filesystem you have a problem. If the VM
> is effectively NFS mounting the filesystem you can do all kinds of
> things.
> It is possible to just change the user namespace and setup your mapping,
> effectively running your VM in the user namespace, and that would allow
> the VM to see your mapped uids.

In some cases I was thinking of mounting a filesystem directly from a block
device, but more often it would be directories in a local host filesystem.
I use qemu's built in virtio 9p-over-pci to pass these in at present.

So in principle, that does mean I could store UIDs translated and wrap
everything else I do at host level in a userns translation layer as well,
but it's quite an intrusive thing to do and I imagine it would preclude
lightweight throwaway containers where I share the host filesystem read-only
into a container.

This is why I was quite keen to avoid mangled ownerships in the host
filesystems at all, but from what you say, that goal sounds like this might
be rather tricky to achieve.

> There are too many things in /proc and /sys and similar that
> grant access to uid == 0.

Ah yes, I can see why this is a thorny one. Is it just the synthetic
filesystems like /proc and /sys that are the problem, or are there loads of
other places in the kernel that assume uid == 0 implies privilege? I.e. is
it 'just' a matter of somehow securing access to procfs and sysfs, or a much
wider issue?

Best wishes,

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at