Re: devfs - the missing link

From: Alexander Viro (viro@math.psu.edu)
Date: Wed May 17 2000 - 21:38:43 EST


On Thu, 18 May 2000, Neil Brown wrote:

> 1/ namespaces - as in the plan9 idea that each process, or process
> grouping, can have a different namespace.
>
> You seem to be very keen on this idea. While I think that it is
> an elegant idea, I don't think it fits well with Unix, and I
> think that it is fairly independant on the main devfs related
> issues.

Per-process ones - for sure. Notice, however, that combination of chroot()
(that we need) and appropriate global namespace is all we need for chroot
jails. Per-process namespaces are neat, but they are, indeed, orthogonal
to the devfs problems.

> Free control of your namespace (as in plan9) does not sit well
> with setuid programs (which are absent from plan9). Many setuid
> programs "trust" specific path names, both to read/write and to
> execute. If you can control your name space, you can fool such
> programs easily. If setuid programs get the default namespace,
> you might have difficulty communicating with setuid programs
> ("lpr /this/file" "lpr: file not found").

Since you are asking: mount() and bind() (Plan 9-ish one; thing that is
currently done by mount -t bind (yuck)) done by non-root must require the
write permissions on mountpoint. If luser has write permissions on /etc -
well, mount is the least of your troubles...

> Namespaces certainly have some nice features - like mounting all
> of the directories in your PATH onto /bin and others - but there
> are certainly problems too. As a sysadmin, the idea allowing
> people to mess up their namespace and then complain "it don't
> work" worries me. If they mess up their filetree, I can go and
> look at it. If they mess up their namespace, it is much harder to
> look at what is really happening, especially after the fact.

Not really. IMO namespaces must be visible through virtual filesystem
(procfs or something else - it depends). Visible to root and creator of
namespace (if different from root). Again, per-process namespaces are
irrelevant here.

> 3/ structure of devfs - different directories or different filesystems.
>
> You seem to be really keen on the idea of having each device
> driver produce it's own little filesystem-instance, and then to
> bind all of these together into a filetree, presumably controlled
> by some ASCII file.
> I don't really see what that would gain us. If we were doing a
> light-wieght micro-kernel with drivers in separate processes (or
> whatever word is used) I could see the point, but we aren't.
> Linux, in the Unix tradition, is a monolithic kernel.

And? Sure, it's monolithic. That has nothing to benefits of having these
trees separate.
        a) it avoids magic since we can use natural granularity (mount
one) instead of devfs-specific tricks.
        b) it avoids magic since we can use generic automount code instead
of <<--->>
        c) it avoids magic since it doesn't involve "magic links"
        d) it avoids magic since it puts the naming policy into the
userspace _without_ special daemon (or with much simpler one).
        e) it avoids magic since it kills the trickery with
->d_revalidate().
        f) it avoids politics since it closes the procfs vs. devfs issues.
        g) it avoids complexity since the code for each of those
mini-filesystems is _much_ simpler than current devfs or procfs (see
ramfs).
        h) with your per-mountpoint default permissions (kudos) it avoids
magic with maintaining the permissions on these objects.

        The bottom line being: it uses simple and understandable _generic_
mechanisms instead of inventing special-case tricks and it makes the
kernel code much simpler.

> Or there is the example of the symlink from a disc drive to a
> mounted filesystem. When you mount a filesystem, you tell the
> filesystem module about a particular disc drive. This
> association is reflected in the symlink that gets created.
>
> I don't think this involves any drivers having non-local
> knowledge. It just involves drivers communicating with one
> another.

        ... which is more or less equivalent in terms of the cost when you
want to change the layout.

> Could you let us know what you think is the particular value of
> having different filesystems per driver instead of different
> directories per driver?

        See above.

>
> If it is that it allows us to bind together the trees in whatever
> organisation we like (as could possibly be suggested by one of
> your comments) then I think it is much better for the device tree
> to be very fixed/stable/predictable. There is a place of
> configurability, but it isn't in the device tree. It is in /dev,
> which contains gateways to the device tree.

        Thanks, but no thanks. Linus may approve the current devfs layout,
but IMO it's ugly as hell.

> 4/ The nature of device special files.
>
> You say
> > Why on the Earth do you _want_ to keep them in filesystem?
>
> Answer: because that is what Unix does. It keeps things in
> filesystems (see issue 2).
> What I think we are talking about here is the
> sysadmin/distribution chosen name and ACL for a kernel-defined
> object.
> This has to live in the filesystem so that it can hold an ACL.
> And it should live there because that is the best place to store names.

        mount(8) _already_ has to deal with access control. E.g. for
remote filesystems. Or for 'user' in options. So keeping ACL-style stuff
there (where it's visible, editable, can be backed up, grepped, etc.) is
Good Thing(tm).

> You also say:
> > And leave the usual device nodes alone - they are very happy as they are.
>
> They may be happy, and you may be happy, but their are those who
> aren't.
>
> Given that device node currently only have 16 bits, and that
> dividing these bits up hierarchically just doesn't work any more
> - we have too many sorts of devices- what do you propose?
>
> 1/ convince me that it does work (admittedly, I haven't counted
> the devices myself).
> 2/ extend to 32 bits (64 bits? 128bits?) and hope that that will
> be enough?
> 3/ give up the illusion that it is an hierarchical name space and
> just allocate numbers sequentially, as devices are discovered?
> This makes the device number essentially meaningless and means
> that you have to have a dynamic /dev with all the attendant
> problems of dynamic ACLs etc.
> 4/ something else?

        Move new devices to the "small fs" scheme when we'll start running
out of the numbers (close, but didn't happen yet). Look into procfs -
there are obvious devices (well, so they are S_IFREG instead of S_IFCHR -
BFD). Objections to procfs/devfs are mostly from granularity issues (and
from complexity of these beasts). Make it a really dumb fs (and I mean
_really_ dumb - just a dentry tree with a bunch of objects hanging from
it) and allow mounting it alone and you've solved most of the problems.

> 5/ where should the kernel defined device name space live?
>
> I like //devices. You don't like //. I can agree with that,
> though it would be nice to have a permanent name that is
> independant of the root filesystem. However as the kernel
> already "knows" about /etc/init, I guess it can "know" about
> /devices or similar.
> You seem to suggest that the device namespace should remain
> separate from the "regular" namespace. While I could live with
> this, I don't agree.

There are different variants. Notice that we have a pseudo-namespace
already: "type" argument of mount() is essentially a name in a separate
namespace. I see no problems in
        mount -t dac960 -o union /dev
Why not? #<character> namespace is ugly beyond belief - dunno why did they
go for that monster. OTOH, we _might_ mimic /srv convention, extending it
to drivers - hell knows... I would rather keep -t <foo> convention.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:14 EST