Re: device namespaces

From: Eric W. Biederman
Date: Fri Jun 11 2021 - 14:16:07 EST


Christian Brauner <christian.brauner@xxxxxxxxxx> writes:

> On Wed, Jun 09, 2021 at 09:54:05AM +0200, Hannes Reinecke wrote:
>> On 6/9/21 9:21 AM, Christian Brauner wrote:
>> > On Wed, Jun 09, 2021 at 09:02:36AM +0200, Hannes Reinecke wrote:
>> >> On 6/9/21 8:38 AM, Christian Brauner wrote:
>> >>> On Tue, Jun 08, 2021 at 12:16:43PM -0500, Eric W. Biederman wrote:
>> >>>> Hannes Reinecke <hare@xxxxxxx> writes:
>> >>>>
>> >>>>> On 6/8/21 4:29 PM, Christian Brauner wrote:
>> >>>>>> On Tue, Jun 08, 2021 at 04:10:08PM +0200, Hannes Reinecke wrote:
>> >> [ .. ]
>> >>>>> Granted, modifying sysfs layout is not something for the faint-hearted,
>> >>>>> and one really has to look closely to ensure you end up with a
>> >>>>> consistent layout afterwards.
>> >>>>>
>> >>>>> But let's see how things go; might well be that it turns out to be too
>> >>>>> complex to consider. Can't tell yet.
>> >>>>
>> >>>> I would suggest aiming for something like devptsfs without the
>> >>>> complication of /dev/ptmx.
>> >>>>
>> >>>> That is a pseudo filesystem that has a control node and virtual block
>> >>>> devices that were created using that control node.
>> >>>
>> >>> Also see android/binder/binderfs.c
>> >>>
>> >> Ah. Will have a look.
>> >
>> > I implemented this a few years back and I think it should've made it
>> > onto Android by default now. So that approach does indeed work well, it
>> > seems:
>> > https://chromium.googlesource.com/aosp/platform/system/core/+/master/rootdir/init.rc#257
>> >
>> > This should be easier to follow than the devpts case because you don't
>> > need to wade through the {t,p}ty layer.
>> >
>> >>
>> >>>>
>> >>>> That is the cleanest solution I know and is not strictly limited to use
>> >>>> with containers so it can also gain greater traction. The interaction
>> >>>> with devtmpfs should be simply having devtmpfs create a mount point for
>> >>>> that filesystem.
>> >>>>
>> >>>> This could be a new cleaner api for things like loopback devices.
>> >>>
>> >>> I sent a patchset that implemented this last year.
>> >>>
>> >> Do you have a pointer/commit hash for this?
>> >
>> > Yes, sure:
>> > https://lore.kernel.org/linux-block/20200424162052.441452-1-christian.brauner@xxxxxxxxxx/
>> >
>> > You can also just pull my branch. I think it's still based on v5.7 or sm:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=loopfs
>> >
>> > I'm happy to collaborate on this too.
>> >
>> How _very_ curious. 'kernfs: handle multiple namespace tags' and 'loop:
>> preserve sysfs backwards compability' are essentially the same patches I
>> did for my block namespaces prototyp; I named it 'KOBJ_NS_TYPE_BLK', not
>> 'KOBJ_NS_TYPE_USER', though :-)
>>
>> Guess we really should cooperate.
>>
>> Speaking of which: why did you name it 'user' namespace?
>> There already is a generic 'user_namespace' in
>> include/linux/user_namespace.h, serving as a container for all
>> namespaces; as such it probably should include this 'user' namespace,
>> leading to quite some confusion.
>>
>> Or did I misunderstood something here?
>
> Ah yes, you misunderstand. The KOBJ_NS_TYPE_* tags are namespace tags.
> So KOBJ_NS_TYPE_NET is a network namespace tag. So KOBJ_NS_TYPE_USER is
> a user namespace tag not a completely new namespace. The idea very
> roughly being that devices such as loop devices are ultimately filtered
> by user namespace which is taken from the s_user_ns the loopfs instance
> is mounted in. We should compare notes.

There are two easy possibilities.

- All of the devices on the filesystem show up in sysfs with unique
major minor numbers.
- None of the devices on the filesystem show up in sysfs.
(Which I believe is what devpts does).

I favor none of the virtual devices showing up in sysfs. Maybe existing
userspace needs the devices in sysfs, but if the solution is simply to
skip sysfs for virtual devices that is much simpler.

Eric