Re: SCSI access creating lost time

Doug Ledford (dledford@redhat.com)
Sun, 07 Mar 1999 18:53:11 -0500

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Paul Jakma: "Re: Grr.."
Previous message: Linus Torvalds: "Re: [PATCH] a.out don't exec over NFS"
Maybe in reply to: Buddha Buck: "SCSI access creating lost time"
Next in thread: Doug Ledford: "Re: SCSI access creating lost time"

Alan Cox wrote:

> > already deal with. Secondly, be prepared to experience weird things
> > when dropping these locks. If you get things right then that's great,
> > but if you don't it may not show up until you have two cards in the same
> > machine that are active at the same time or other similar scenarios to
> > force the possible newly introduced bug up to the front.
>
> Ok I just wanted to be sure there wasnt some fundamental midlayer reason for
> never dropping it. The right answer is to drop the midlayer, from a great
> height, but thats 2.3 stuff. I realise that.

For any people wishing to look into their SCSI drivers that are causing
time loss, the following rules apply in regards to Alan's comments on
the mid-level SCSI code:

1. If you drop the global io_request_lock, then you must either be
prepared to have your code re-entered or you must provide some alternate
form of locking. Note possible deadlock issues here are *very*
abundant. The global io_request_lock is the *only* thing the mid-level
SCSI code uses to make sure it doesn't call into your code
re-entrantly. And even then it's not a total guarantee. Be aware that
the mid-level SCSI code can call back into your code from the
scsi_done() routine, so an interrupt can generate calls into the
queue_command() routine, queue() routine, or the reset() routine. Be
prepared for such calls when doing your own locking.

2. You must *always* re-aquire the spin lock on the io_request_lock
before you make any calls into the mid-level code. The mid-level code
expects that all low level drivers will be holding this lock when they
call into it for anything other than init functions and relies upon that
to perform it's own locking. Failure to do this will result in the
mid-level code eventually hosing things horribly.

3. Be prepared to handle deadlocks if you drop the lock and do your own
locking. My experimental code in this area used a custom
recursion-allowed-on-the-same-cpu spin lock for this purpose. It's
still in the aic7xxx driver, but ifdef'ed out at the moment. In
general, realize that if you use any of your own locks, then you must
drop all of your own locks before trying to re-aquire the global
io_request_lock, etc. Be aware of all possible entry points in your
driver and any interdependancies when doing this as well (such as "If I
drop the io_request_lock in my driver and spin in the queue routine,
then what happens if I get an interrupt? Will the interrupt handler
call scsi_done() and result in another command in the queue routine, and
if so, what will happen when it hits my locking and will it trounce on
variables, etc?").

Now, having said things so wordly to express the warning I would give
people, the short version is that the mid-level code not only uses the
global io_request_lock spinlock to guarantee that the low level driver
is treated as a single threaded driver, it also does it's own internal
locking with it. So, dropping the lock has more implications than
simply in the low level driver. It also means that SMP programming is
much more fickle once you drop that lock. The design was to be failsafe
for 2.2, and fix it properly in 2.3. The alternative was SCSI race
conditions all through 2.2.

-- Doug Ledford <dledford@redhat.com> Opinions expressed are my own, but they should be everybody's.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Jakma: "Re: Grr.."
Previous message: Linus Torvalds: "Re: [PATCH] a.out don't exec over NFS"
Maybe in reply to: Buddha Buck: "SCSI access creating lost time"
Next in thread: Doug Ledford: "Re: SCSI access creating lost time"