Re: On the subject of the VFS layer (was Re: VFS questions)

Doug Ledford (
Sun, 4 May 1997 05:48:28 -0500 (CDT)

On Sun, 4 May 1997, Michael Neuffer wrote:

> You contradict yourself within that mail several times. It would make the
> current code even more complex and even less debuggable and bugfree.

That's because I was rambling through several possible answers without
trying to limit it to one all encompassing "This should do the trick" type

> > From the standpoint of the aic7xxx driver, I know that a callin routine to
> > let us know that a command has timed out, and allowing us to provide a
> > suggestion as to the next course of action would be edvantageous even on
> > that streamlined controller. Currently, we issue one in every 200
> > commands to a particular device as an ordered queue tag in order to keep
> > any particular command from being placed too far onto the back burner as
> > part of the drives queue optimization. Knowing that a command timed out
> > would allow us to then send an ordered tag command to force the command to
> > complete without having to go through an abort/reset routine in order to
> > resurrect the command. This would also allow us to skip the once every so
> > often method of sending ordered queue tags so the drives could be more
> > efficient in how they handled tagged queueing. Could the same thing be
> > used under the DPT driver to solve these problems you are mentioning.
> No, and here you can see one of those things I told you about before.....
> EATA compatible controllers like the DPT do not want to be fed with any
> tags. The controller analyzes in the incomming "stream" of commands
> and does the necessary tagging by itself. Weather the command ultimately
> reaches a drive within a RAID array or a "normal" drive doesn't matter.

That's irrelevant. The mid level SCSI code doesn't pass us a mandatory
tag value, they just pass a SCSI_Cmd * that the driver then uses to build
its own internal command from. This is all hardware/driver dependant,
very normal, and most importantly, it's all ready a black box
representation. The only assumption that the mid level scsi code makes is
that it's talking to a scsi device. I agree that it has been written to
treat them as a SCSI black box, but none the less, it still leaves it up
to the driver to handle the actual hardware communications based upon two
very important things. First, it passes the actual command. Second, it
passes the Scatter Gather blocks that are to be used. Beyond that, a low
level driver isn't really tied to doing anything in any certain fashion.

> You have to look at such intelligent controllers as black boxes:
> At one end you stick in an _IO-request_ and on the other end the box will
> spit out a result or an error message.

Hmmm....I guess you'll have to define _IO-request_. If by IO-request you
*don't* mean a scsi command and a sg list, then we aren't really talking
to a scsi device (which I think goes without saying for the DPT driver
anyway since it isn't a scsi controller, it's an EATA controller that uses
SCSI devices transparently as far as I know). I guess my point here is
that you're advocating we do away with the mid level scsi code because it
gets in the way of your black box representation. At this point (without
the benefit of further clarification) I'm beginning to wonder whether the
DPT driver should even be registered as a scsi controller, not whether we
should do away with the scsi mid level code. With the implementation of a
callin routine for timeout suggestions, you really remove the last portion
of strict control from the mid level scsi code and allow low level drivers
free reign to do things as they wish excepting only the actual command and
the sg list (which a person could re-write if they wanted to in their low
level driver, but for scsi controllers, there is no need to).

More directly to the point is this. The scsi mid and upper level code
really only handle a few things. They create scsi commands for the actual
devices based upon the needed information and they organize the buffers
into a scatter gather list. The scsi command itself is drive dependant,
not controller dependant and is intended to be passed to the drive
untouched. The scatter gather list is somewhat controller dependant, but
only to the extent that it tells the controller where to put each 512 byte
(or larger) chunk of information in actual memory so that a read doesn't
have to be copied around (writes as well for that matter). After it
creates this information and checks to make sure that the low level driver
doesn't already have its maximum number of commands outstanding, it passes
the command to the low level driver. It then handles setting a timeout
value on the command itself so it will have an idea if the command is over
due for completion. If the command times out, then it calls the low level
driver for any abort/reset actions that may be needed. If the command
completes normally, then it is passed up to the higher levels after a few
sanity checks as being complete. As long as you are talking to SCSI
devices, you can't get much more black box than that. The only exception
is the one I've already proposed a solution to which is the handling of
the timeout circumstance. Now, for all of these streamlined SCSI
controllers, this is a very competent level of abstraction that takes the
handling of the non-controller dependant work out of the hands of the low
level driver. It also leaves the controller free to do what it needs to
do. Why lambast this abstraction layer when the real problem you are
running in to is not that the abstraction layer is broke, but that the
controller you are working on doesn't truly fit underneath this
abstraction layer?

> If you've ever used a mainframe imagine a DPT controller to be a "little
> brother" of an IBM IO-Channel. All this great logic that you want to apply
> from the aic7xxx or NCR or Buslogic simply does not fit.

And this is the same reason why the DPT controller "simply does not fit"
the description of a SCSI controller as the mid level code was intended to
handle. This doesn't mean we should scrap the mid level code, it means
the DPT driver shouldn't be registered underneath it.

> Or just imagine SSA hardware or a controller from Genrocco.........
> > As
> > a sidebar, I would consider it advantageous to only implement the
> > suggestion routine as oppossed to a full timeout mechanism. In this way,
> > you only have to write the logic that determines what should be done, and
> > not the logic to handle the timeouts themselves.
> You can reduce this to something very simple:
> DONE: Command finished
> RETRY: Retry command, requeue
> QUEUE FULL/TRY LATER: Command could not be queued, requeue
> ERROR/COMMAND DEAD: An error occured, the command is dead, this
> information must go up to the requester.
> Then use routines like the current ones or your own to do the
> timeout and error handling.
> What you must realize is that the hardware is getting more and more
> intelligent and will not stand still at the level of a Buslogic.

This may be true for high end controllers, but the average home user and
the average driver for the home user doesn't need the level of complexity
you are advocating. Furthermore, Gateway 2000 isn't going to start
shipping their next model of computer with DPT controllers standard. We
really are talking about specialized/high end hardware here, and I see no
need to scrap the current method of support for the more common hardware
in the process of properly supporting high end hardware.

> The current midlevel code has been suffering for a long time under
> a tight skin syndrome where you can't controll the sideeffects
> anymore. The solution is not to but in even more stuff into the midleve
> layer, but instead to pull it out and symplify. There are no clean
> layers. The code must be put back where it belongs.

I think a more appropriate solution would be to have a new block device
driver that does what you want. It passes a raw IO-request to the low
level driver and the low level driver then tells it a result. The low
level driver would be responsible for all timeouts, housecleaning,
queueing, error handling/recovery, etc. There is no reason to yank these
things from the mid level scsi code where they currently fit quite well
for scsi drivers. Instead you allow special/intelligent devices the
ability to build their own drivers without the hindrance of the mid level
code and leave it in place for those that are quite content to run
underneath it. In this fasion, you can create a new block driver
hierarchy similar to the scsi hierarchy, but that gives these intelligent
controllers the freedom to do what they need to do. You would register
your driver with this minimal abstraction layer at boot time or insmod
time (the abstraction layer is really to avoid having to allocate a bunch
of different major numbers when you can have just one that transparently
sends commands to the intelligent driver).

* Doug Ledford * Unix, Novell, Dos, Windows 3.x, *
* 873-DIAL * WfW, Windows 95 & NT Technician *
* PPP access $14.95/month *****************************************
* Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
* communities. Sign-up online at * Web page creation and hosting, other *
* 873-9000 V.34 * services available, call for info. *