Re: PROPOSAL: /proc/dev (new idea)

Kjetil Torgrim Homme (kjetilho@ifi.uio.no)
02 Jan 1998 04:58:07 +0100


(there's another suggestion regarding how to handle /dev one page down)

[Patrick St. Jean]

| May I make a suggestion? Has anyone ever looked at Solaris' device
| handling? They have a directory called /devices where the _REAL_
| devices are (they use really ugly nasty names for them) and then
| the _KERNEL_ when given a -r flag will recreate the symlinks in
| /dev.

Actually, it's done in user space by /usr/sbin/drvconfig (called from
/etc/rcS.d/S50drvconfig), and this will update the /devices directory
as well. Default permissions (i.e., when a new device file needs to be
created) will be taken from /etc/minor_perm. That system seems nice
and dandy. It's what happens during every boot that's hairy :-)

Every device has a name, like "zs" for Zilog serial. When you install
the hardware, the system will add an entry in the kernel saying that
"zs" has major 42. A line "sz 42" is added to /etc/name_to_major, and
"/sbus@1f,0/zs@f,1000000" 1 "zs"
"/sbus@1f,0/zs@f,1100000" 0 "zs"
is added to /etc/path_to_inst. E.g., the special file
/devices/sbus@1f,0/zs@f,1100000 is created with major 42 and minor 0.
Evidently, all needed hardware info is encoded in the device name.

/usr/sbin/devlinks will create entries in /dev, based on hints found
in /etc/devlink.tab. Disks are handled by /usr/sbin/disks. Tapes by
/usr/sbin/tapes. It's a mess, but it seems to

Now, if you simply move the root disk to another SCSI-controller, you
will have one hosed Solaris-system since it has no way of getting up
without all these files and symlinks matching.

I must say that my understanding of how this works is far from
complete, but I hope this gives an impression of what a rat's nest
this is. It does work well in practice if you leave it alone, though.


My proposal
===========

Keep /dev on ext2 or whatever. Use that as your database to remember
owners and permissions. The major/minor could fluctuate, however.

During boot, the kernel compiles a list of devices, and composes a
list somewhere in /proc, say /proc/devicelist [1]. The list could look
something like

[driver devname maj min c/b owner:group perm]

ide0 hda 3 0 b root:disk 0640
ide0 hda1 3 1 b root:disk 0640
ide0 hda5 3 2 b root:disk 0640
scsi0 sda 8 0 b root:disk 0640
scsi0 sda1 8 1 b root:disk 0640
psaux psaux 10 0 c root:root 0664

and so on. owner:group and perm is just the _default suggestion_ from
the kernel.[2]

You could make it more involved and do things like the much wanted

ide0 ide/disk/c0ps1 hda1 3 1 b root:disk 0640
ide1 ide/cdrom/c1ss0 hdd 22 64 b root:disk 0640
scsi0 scsi/disk/c0t0d0s0 sda 8 0 b root:disk 0640
^^^^^^^^^^^^^^^^^^

Adjust classifications and hierarchy according to taste.[3]

Then, a modified MAKEDEV should run. It'd look in the /proc/devicelist
and create the missing entries in /dev. If the entry exists, but has
wrong major/minor, delete it (after taking note of owner:group + perm)
and then create a new one.

The hard part is kerneld. During boot, a module would register in the
device list, but with major:minor:type set to a magic kerneld device,
10:255 or something. I hope that the kernel knows which inode (e.g.,
"/dev/hdd") was accessed to cause kerneld to kick in -- this is a big
unknown for me.[4]

Kerneld can then look in the device list, find the module, fire it up.
The module will be allocated one or more major:minors, and updates the
device list. insmod would then run "MAKEDEV ide1" or similar so that
/dev is up to date. Similarily, rmmod must change all relevant device
files back to the magic kerneld device 10:255.

The nice thing about this system is that it is relatively easy to
understand, and it supports symlinks and weird permissions without
special cases.

Kjetil T.

[1] One concern is that e.g. the pseudo tty driver alone can generate
hundreds of entries in that list. This calls for a more efficient
way of encoding the dev info in the list. Perhaps there should be
a function in the statically linked part of a module to enumerate
the available minors with suitable path names.

[2] It's unfortunate that group names are not standardized, and I
don't like putting them in the kernel, really.

[3] If you like, you can do make a tree like /proc/dev/disk/ide/c0ps1
which has all the default values set. But the "ide0" and "hda1"
information must be encoded somewhere, "hda1" if only for
backwards compatibility. This could be done emitting that info
when the file is read (yes, ignore that stat() claims it's a block
special file).

[4] If you have the enumeration functions mentioned in [1], you'll
need a corresponding query function to locate the module. Look up
by major number can't be used either way.