Re: SCSI device numbering (was: Re: Ideas for v2.1

Leonard N. Zubkoff (lnz@dandelion.com)
Tue, 2 Jul 1996 16:33:23 -0700


I gather I haven't explained my reasoning all that well, so let me try again...

Let's step back for a moment and consider our goals for user and kernel device
naming and then see how they impact our implementation options. I claim the
following goal is worthwhile:

In the event a particular host adapter or peripheral fails and the system
reboots unattended, the names of the remaining peripheral devices as they would
be used in /etc/fstab or in an explicit fsck command should not change.
Therefore, while particular file systems might not be available, no potentially
disastrous incorrect mounts will take place.

Furthermore, in the event that a device or host adapter is added or removed
from the system, the name of any other device should not change unless it has
been moved or its host adapter has been moved.

Now the mapping between device names and physical peripheral devices is really
a two stage process:

name ==> dev_t ==> device/partition

Most of the kernel services expect a device name and derive the dev_t and
device/partition from it. The dev_t values are revealed in the stat structure,
but I don't believe there is any system call which accepts a dev_t from the
user.

The important point about the above goal is that it requires that the name ==>
device composite mapping will not change upon failure of a device or host
adapter. If we require that the name ==> dev_t mapping is provided solely by
the entries in /dev as is presently the case, and since the root file system is
mounted read only at the time fsck runs, this implies that the name ==> dev_t
mapping is fixed across a reboot. Thus in order for the name ==> device
composite mapping to remain unchanged, the dev_t ==> device mapping must also
not change.

This implies that the device ==> dev_t mapping must not be based on anything
that can be affected by a device or host adapter failure. The present strategy
of assigning dev_t ordinal values based on the order in which devices are found
clearly violates this goal. The mapping between devices and dev_t values could
be based on the location of the host adapter and device, so long as this
information is available:

*** PCI Host Adapters ***

(Bus Type = PCI, bus, device, function, channel, target, lun, partition)

I believe the above 8-tuple is sufficient as a minor number for PCI for any
SCSI device accessible via a PCI SCSI host adapter. None of these elements
are dependent on the number or type of installed PCI host adapters. The
bus and device are a property of the physical slot in which the host
adapter is installed, together with the function and channel which are
properties of the host adapter implementation.

*** ISA Host Adapters ***

(Bus Type = ISA, I/O address, channel, target, lun, partition)

This is feasible only if the I/O address is a constant. For many boards it
will not change without user intervention. I am not clear on whether ISA
Plug & Play allows resource assignments to change arbitrarily on reboot as
PCI does. That is, can the disappearance of an ISA device cause the
resources assigned to other ISA devices to change? How about the addition
of a host adapter?

We cannot use slot numbers for ISA since we cannot tell one slot from
another. For EISA, I believe we could use an approach similar to PCI since
one can identify host adapters by their slot number.

It's reasonably straightforward to encode dev_t from the above n-tuples and the
major number once we know the limits on the individual elements. The problem
is encoding this information in 32 bits.

Another alternative to the above strategy is to consider the location of the
device irrelevant and try to base dev_t on a unique property of the device
itself. Unfortunately, such a strategy probably requires a very large dev_t.
This is possible if we use bignums (arbitrary precision integers) for dev_t,
but not a very realistic approach (e.g. bignums aren't standard C types).

What I've been arguing for is to allow both the name ==> dev_t and dev_t ==>
device mappings to change, and keep only the composite name ==> device mapping
a constant. The advantage of this strategy is that it allows for a smaller
dynamic dev_t, since the number of actual devices we can connect to any real
system is vastly smaller than the theoretical possibilities expressed above. A
further advantage is that it may allow a device's name to remain unchanged even
if the device is moved between host adapters, channels, or even has its target
or lun changed.

I believe it's clear that to meet our goal above, we must either abandon
dynamic assignment or abandon a fixed name ==> dev_t mapping.

Now I certainly agree that the kernel should not set policy, but to a certain
degree it has to. The present ordinal mapping of probed devices into dev_t
entries is certainly a policy decision: it utterly precludes meeting the goal
above.

There's no reason we could not implement a new RAM file system -- devfs -- to
support this functionality without requiring all the overhead of /proc. Device
entries in this file system would be very lightweight, only having names,
dev_t, owner, and protection information. It would be populated by the kernel
at boot time and by whatever system programs are provided to make convenience
mappings. I expect the remaining issues could be worked out. There's not even
a need for multiple mountings of devfs; any need for a separate device
directory can be handled by constructing a normal directory and making special
files based on what's in devfs.

Leonard