Re: SCSI device numbering

Albert Cahalan (albert@ccs.neu.edu)
Wed, 3 Jul 1996 03:05:48 -0400 (EDT)


From: "Leonard N. Zubkoff" <lnz@dandelion.com>

> Let's step back for a moment and consider our goals for user
> and kernel device naming and then see how they impact our
> implementation options. I claim the following goal is worthwhile:
>
> In the event a particular host adapter or peripheral fails and
> the system reboots unattended, the names of the remaining
> peripheral devices as they would be used in /etc/fstab or in
> an explicit fsck command should not change. Therefore, while
> particular file systems might not be available, no potentially
> disastrous incorrect mounts will take place.
>
> Furthermore, in the event that a device or host adapter is added
> or removed from the system, the name of any other device should
> not change unless it has been moved or its host adapter has been moved.

Static devices relevant to the host adapter would not be hard.
Completely static devices across adapters would be very hard,
and perhaps not even desirable in the two card case. It is easy
enough to always give the boot disk the same adaptor number.

> Now the mapping between device names and physical peripheral
> devices is really a two stage process:
>
> name ==> dev_t ==> device/partition
>
> Most of the kernel services expect a device name and derive the
> dev_t and device/partition from it. The dev_t values are revealed
> in the stat structure, but I don't believe there is any system call
> which accepts a dev_t from the user.
>
> The important point about the above goal is that it requires that
> the name ==> device composite mapping will not change upon failure
> of a device or host adapter. If we require that the name ==> dev_t
> mapping is provided solely by the entries in /dev as is presently
> the case, and since the root file system is mounted read only at
> the time fsck runs, this implies that the name ==> dev_t mapping
> is fixed across a reboot. Thus in order for the name ==> device
> composite mapping to remain unchanged, the dev_t ==> device mapping
> must also not change.

Not really. If I change my boot host adapter, a dynamic system can
ensure that the boot adaptor always gets the same ID. Then all the
disks on that adaptor get the same dev_t they had before. In a
system with a second adaptor, even disks on the second adaptor
will always get the same dev_t. Larger (somewhat HUGE) systems can
find space on the first adaptor to store a program to fix things up,
with symlinks, /dev reconstruction, or a syscall to swap host adaptor
ID codes.

> This implies that the device ==> dev_t mapping must not be based
> on anything that can be affected by a device or host adapter failure.
> The present strategy of assigning dev_t ordinal values based on the
> order in which devices are found clearly violates this goal. The
> mapping between devices and dev_t values could be based on the
> location of the host adapter and device, so long as this information
> is available:
>
> *** PCI Host Adapters ***
>
> (Bus Type = PCI, bus, device, function, channel, target, lun, partition)
>
> I believe the above 8-tuple is sufficient as a minor number for PCI for any
> SCSI device accessible via a PCI SCSI host adapter. None of these elements
> are dependent on the number or type of installed PCI host adapters. The
> bus and device are a property of the physical slot in which the host
> adapter is installed, together with the function and channel which are
> properties of the host adapter implementation.

So when I change my SCSI card, the root filesystem remains read-only
and /usr does not get mounted because they now have new device names.

> *** ISA Host Adapters ***
>
> (Bus Type = ISA, I/O address, channel, target, lun, partition)
>
> This is feasible only if the I/O address is a constant.

We may think that Win95 treats Linux badly now, but we have not seen
real torture yet. Win95 will reassign IO addresses just for fun.

> It's reasonably straightforward to encode dev_t from the above
> n-tuples and the major number once we know the limits on the
> individual elements. The problem is encoding this information
> in 32 bits.

The problem is that card addresses _will_ change often. It is only
possible to identify the adaptor used to boot the system (usually!).

> What I've been arguing for is to allow both the name ==> dev_t and
> dev_t ==> device mappings to change, and keep only the composite
> name ==> device mapping a constant. The advantage of this strategy
> is that it allows for a smaller dynamic dev_t, since the number of
> actual devices we can connect to any real system is vastly smaller
> than the theoretical possibilities expressed above. A further
> advantage is that it may allow a device's name to remain unchanged
> even if the device is moved between host adapters, channels, or
> even has its target or lun changed.
>
> I believe it's clear that to meet our goal above, we must either
> abandon dynamic assignment or abandon a fixed name ==> dev_t mapping.

Dynamic cards with static devices can give a reasonably stable mapping.

> Now I certainly agree that the kernel should not set policy, but
> to a certain degree it has to. The present ordinal mapping of
> probed devices into dev_t entries is certainly a policy decision:
> it utterly precludes meeting the goal above.
>
> There's no reason we could not implement a new RAM file system
> -- devfs -- to support this functionality without requiring all
> the overhead of /proc. Device entries in this file system would
> be very lightweight, only having names, dev_t, owner, and protection
> information. It would be populated by the kernel at boot time and
> by whatever system programs are provided to make convenience
> mappings. I expect the remaining issues could be worked out.
> There's not even a need for multiple mountings of devfs; any need
> for a separate device directory can be handled by constructing
> a normal directory and making special files based on what's in devfs.

That would let us keep 16-bit dev_t and solve many problems.
Traditionalists hate it though. It's like AIX and solaris to
them, only worse. Maybe as bad as NT or VMS. I like it though.
You have to remember that some people have fond memories of
Seventh Edition Unix, and even useful change is terrifying.

Using a dev_fs would require a reduced version, or perhaps a
system call to allocate a pty. I think that is a minor problem.