Re: My $0.02 on devd and devfs

Horst von Brand (vonbrand@sleipnir.valparaiso.cl)
Mon, 11 Oct 1999 12:46:19 -0300


Richard Gooch <rgooch@ras.ucalgary.ca> said:
> H. Peter Anvin writes:
> [I'm breaking my silence because this is one of the few coherent posts
> on the subject, and because it raises a point that, while I've tried
> to address it before, in retrospect I might not have gotten the point
> across.]
> > By author: Nathan Hand <nathanh@chirp.com.au>
> > > HPA, would you be open to the idea of /proc/devices. This won't be
> > > a terrific loss of functionality from the existing devfs.

> > I have thought a lot about this, and I have been trying to avoid
> > sounding like I flame. I *do* believe that devfs is a very inelegant
> > solution, but it is a solution to a real problem. It is not, in my
> > opinion however, the *right* solution.
> [...]
> > The right solution -- which the devfs people have correctly identified
> > -- is a user-space daemon. However, once you have the user-space
> > daemon, "devd", I believe you neither need nor want the virtual
> > filesystem, in the general case. However, I can understand that in
> > some configurations (like embedded systems) it may be desirable.

> > This is what I would like to see:

> > * A device daemon, devd, which can add devices on demand. I was
> > thinking of one which would receive data packets like the following:
> >
> > <stub_name, type, major, first_minor, count, naming_scheme>
> >
> > e.g.
> >
> > <"ttyS", char, 4, 64, 192, "serial">
> >
> > ... where "serial" would mean the daemon should find the iterator
> > for this particular class in "/usr/lib/devd/serial.so".

Should probably be /lib/devd/<kernel-version>/serial.so, mirroring modules,
or perhaps even /lib/modules/<kernel-version>/devd/serial.so (less
namespace pollution, current way of backing up a kernel continues to work
(yes, very weak, I know)). You can't depend on /usr being mounted (it's
device might be managed by devd, think initrd). Humm... in that case, it
could probably share machinery with modutils... also, the iterators will
have to be part of the kernel, or at least be able to be compiled (better
yet: modprobe(8)ed or such) into it for a dynamic /dev.

> OK, we agree that fundamentally, the kernel has to provide device
> availability information in a consistent and coherent manner to
> user-space. Either /proc/device_notifier or devfs can provide
> this. There are two ways that /proc/device_notifier could work:
>
> - it's a true notifier, and doesn't mantain state (i.e. a list of
> what's already there). I see this as totally unworkable because devd
> would then not know about devices found before it starts

Agree.

> - it *does* maintain state, which is then a degenerate case of devfs.

In what sense? A dynamic /dev could tell me that there are new devices
(needs extensions to the current system calls, and there isn't any clear
model for this either: If a new device appears under /dev/printers/, Unix
condiders that /dev hasn't been touched at all), but I'll have to walk over
the tree to find out which one. That means at least a systemcall for each
node in there (hundreds or thousands, if a loaded system is given, and that
kind of load is probably prohibitive there), or just one or two read(2)s
for the file which is then analyzed without further kernel involvement. So,
for efficiency reasons alone, you'll retain /proc/devices; then a dynamic
/dev is redundant.

> So a stateful /proc/device_notifier could work. But I think devfs is
> a better approach, because:
>
> - it does not require the daemon to parse a file to work out what
> devices are present. A filesystem is a natural way to present a tree
> structure; a file is not. Devfs is moving towards a structure that
> also reflects the physical topology of the hardware (i.e. bus# and
> slot# will appear in device paths), which will reinforce this point

Right. But it is easy to build that from a flat file, and this is not
something that will be done second by second, but hour by hour, so the
overhead isn't significant. Besides, if the devices themselves are kept on
ext2, you get persistent information (owner, permissions, ACLs, last
access, ...) for free, and it is guaranteed in synch formatwise with the
rest of the files of the machine.

> - not having the virtual FS means you don't trap FS events (like inode
> lookups) which means that you can't do module autoloading, nor can
> you speculatively create arbitrary namespaces

Why do you want to trap inode lookups? Trapping access to nonexistent
devices is done now for module loading; once the device file has been
created by devd, everything works just as today. Devices get added by
conecting to the system, not by trying to access them.

Speculatively creating namespaces is something that can't be done
automatically AFAIKS, so it will have to be done by hand, using MAKEDEV or
such should be fine for that.

> - since you need to store the device tree structure in the kernel
> anyway (see above), you may as well allow it to be mounted, which
> gives maximum flexibility to users (and adds very little extra
> code).

There might very well be a quite different tree structure for human
consumption. I.e., /dev/printers/[0-5] is a parallel printer, a serial
port, a few USB printers and even a pipe to a "virtual printer" over the
net.

> /proc/device_notifier is a functional subset of devfs, but prohibits
> users to make the choice to have a virtual /dev.

Why a virtual /dev, in the first place?

> As I've said time and
> again, devfs gives *choice*. People can not use it at all, or mount it
> elsewhere and use devfsd to manage a disc-based /dev. Despite (often
> offensive) claims to the contrary, devfs won't stop people from
> maintaining their traditions.

> > * devd should not *delete* devices in normal operation, unless they
> > have been superceded. Deleting device nodes is generally a
> > destructive operation.

> Well, I don't agree, but that's a policy issue that the user can
> decide.

A daemon doing this on the disk is in the user's hand, potentially _much_
more flexible (offer half a dozen alternative handlers, or one swiss army
knife with a forbidding configuration file syntax plus a frontend a la
linuxconf, or have everybody write their own as a bunch of scripts like
/etc/rc.d/ that call simple tools). Any dynamic /dev _forces_ decisions on
policy into the kernel because access to kernelsea is very limited from
userland, unless you add a whole new forest under /proc or add oodles of
ioctl(2)s, or both. It also is always resident in RAM and is part of the
kernel, so it will be severely limited in size and complexity for security
and stability reasons, and the very real need for running Linux on hardware
challenged machines. Note that my kernel here is a bit larger that 1Mb,
linuxconf (with which a full handler would have to be comparable, roughly)
is 700Kb, without libraries.

> > Notice that this interface would *also* be usable for devfs (which
> > would have to include all the various iterators etc in kernel space,
> > but it would have to anyway), which makes devfs an optional,
> > isolated feature. This is a Good Thing: I don't have anything
> > against devfs as an *isolated* feature for the people who want to
> > use it (lazy/careless admins, embedded systems...) I *do* have a
> > problem with it becoming ubiquitous, and I do have a problem with it
> > being a requirement for each device driver. However, with this
> > configuration devd would effectively be the "standard" mode of
> > operation, and devfs would be an "alternate", using the same
> > interfaces.

> Having devfs in the kernel *absolutely does not* mean that each device
> driver has to call <devfs_register>. In the early days of the patch,
> not all the device drivers I use were patched. Nevertheless, my system
> continued to work.

In the end, if HPA's scheme gets implemented, there will be a
<dev_register> that handles all anyway, so this isn't the point. His point
(IIUC) is that showing this information though a dynamic /dev _forces_ you
to depend on it, a /proc/devices can be cleanly complemented by a dynamic
/dev if need be, without affecting anything else.

-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Viņa del Mar, Chile                               +56 32 672616

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/