Re: device namespaces

From: Christian Brauner
Date: Wed Jun 09 2021 - 04:09:26 EST


On Wed, Jun 09, 2021 at 09:54:05AM +0200, Hannes Reinecke wrote:
> On 6/9/21 9:21 AM, Christian Brauner wrote:
> > On Wed, Jun 09, 2021 at 09:02:36AM +0200, Hannes Reinecke wrote:
> >> On 6/9/21 8:38 AM, Christian Brauner wrote:
> >>> On Tue, Jun 08, 2021 at 12:16:43PM -0500, Eric W. Biederman wrote:
> >>>> Hannes Reinecke <hare@xxxxxxx> writes:
> >>>>
> >>>>> On 6/8/21 4:29 PM, Christian Brauner wrote:
> >>>>>> On Tue, Jun 08, 2021 at 04:10:08PM +0200, Hannes Reinecke wrote:
> >> [ .. ]
> >>>>> Granted, modifying sysfs layout is not something for the faint-hearted,
> >>>>> and one really has to look closely to ensure you end up with a
> >>>>> consistent layout afterwards.
> >>>>>
> >>>>> But let's see how things go; might well be that it turns out to be too
> >>>>> complex to consider. Can't tell yet.
> >>>>
> >>>> I would suggest aiming for something like devptsfs without the
> >>>> complication of /dev/ptmx.
> >>>>
> >>>> That is a pseudo filesystem that has a control node and virtual block
> >>>> devices that were created using that control node.
> >>>
> >>> Also see android/binder/binderfs.c
> >>>
> >> Ah. Will have a look.
> >
> > I implemented this a few years back and I think it should've made it
> > onto Android by default now. So that approach does indeed work well, it
> > seems:
> > https://chromium.googlesource.com/aosp/platform/system/core/+/master/rootdir/init.rc#257
> >
> > This should be easier to follow than the devpts case because you don't
> > need to wade through the {t,p}ty layer.
> >
> >>
> >>>>
> >>>> That is the cleanest solution I know and is not strictly limited to use
> >>>> with containers so it can also gain greater traction. The interaction
> >>>> with devtmpfs should be simply having devtmpfs create a mount point for
> >>>> that filesystem.
> >>>>
> >>>> This could be a new cleaner api for things like loopback devices.
> >>>
> >>> I sent a patchset that implemented this last year.
> >>>
> >> Do you have a pointer/commit hash for this?
> >
> > Yes, sure:
> > https://lore.kernel.org/linux-block/20200424162052.441452-1-christian.brauner@xxxxxxxxxx/
> >
> > You can also just pull my branch. I think it's still based on v5.7 or sm:
> > https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=loopfs
> >
> > I'm happy to collaborate on this too.
> >
> How _very_ curious. 'kernfs: handle multiple namespace tags' and 'loop:
> preserve sysfs backwards compability' are essentially the same patches I
> did for my block namespaces prototyp; I named it 'KOBJ_NS_TYPE_BLK', not
> 'KOBJ_NS_TYPE_USER', though :-)
>
> Guess we really should cooperate.
>
> Speaking of which: why did you name it 'user' namespace?
> There already is a generic 'user_namespace' in
> include/linux/user_namespace.h, serving as a container for all
> namespaces; as such it probably should include this 'user' namespace,
> leading to quite some confusion.
>
> Or did I misunderstood something here?

Ah yes, you misunderstand. The KOBJ_NS_TYPE_* tags are namespace tags.
So KOBJ_NS_TYPE_NET is a network namespace tag. So KOBJ_NS_TYPE_USER is
a user namespace tag not a completely new namespace. The idea very
roughly being that devices such as loop devices are ultimately filtered
by user namespace which is taken from the s_user_ns the loopfs instance
is mounted in. We should compare notes.

Christian