Re: [RFC v3 01/15] scsi: Drop struct Scsi_Host->host_lock usage inscsi_dispatch_cmd()

From: Nicholas A. Bellinger
Date: Fri Sep 24 2010 - 17:17:16 EST


On Fri, 2010-09-24 at 16:10 -0500, Brian King wrote:
> On 09/24/2010 03:44 PM, Nicholas A. Bellinger wrote:
> > On Fri, 2010-09-24 at 08:41 -0500, Brian King wrote:
> >> On 09/23/2010 06:37 PM, Nicholas A. Bellinger wrote:
> >>> @@ -651,7 +655,6 @@ static inline void scsi_cmd_get_serial(struct Scsi_Host *host, struct scsi_cmnd
> >>> int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> {
> >>> struct Scsi_Host *host = cmd->device->host;
> >>> - unsigned long flags = 0;
> >>> unsigned long timeout;
> >>> int rtn = 0;
> >>>
> >>> @@ -736,15 +739,11 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> scsi_done(cmd);
> >>> goto out;
> >>> }
> >>> -
> >>> - spin_lock_irqsave(host->host_lock, flags);
> >>> /*
> >>> - * AK: unlikely race here: for some reason the timer could
> >>> - * expire before the serial number is set up below.
> >>> - *
> >>> - * TODO: kill serial or move to blk layer
> >>> + * Note that scsi_cmd_get_serial() used to be called here, but
> >>> + * now we expect the legacy SCSI LLDs that actually need this
> >>> + * to call it directly within their SHT->queuecommand() caller.
> >>> */
> >>> - scsi_cmd_get_serial(host, cmd);
> >>>
> >>> if (unlikely(host->shost_state == SHOST_DEL)) {
> >>> cmd->result = (DID_NO_CONNECT << 16);
> >>> @@ -753,7 +752,7 @@ int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
> >>> trace_scsi_dispatch_cmd_start(cmd);
> >>> rtn = host->hostt->queuecommand(cmd, scsi_done);
> >>> }
> >>> - spin_unlock_irqrestore(host->host_lock, flags);
> >>> +
> >>> if (rtn) {
> >>> trace_scsi_dispatch_cmd_error(cmd, rtn);
> >>> if (rtn != SCSI_MLQUEUE_DEVICE_BUSY &&
> >>
> >> Are you planning a future revision that moves the acquiring of the host lock
> >> into the LLDD's queuecommand for all the other drivers you don't currently
> >> touch in this patch set?
> >>
> >
> > Hi Brian,
> >
> > I was under the impression that this would be unnecessary for the vast
> > majority of existing LLD drivers, but if you are aware of specific LLDs
> > that would still need the struct Scsi_Host->host_lock held in their
> > SHT->queuecommand() for whaterver reason please let me know and I would
> > be happy to include this into an RFCv4.
>
> I would think that most drivers might have issues without some pretty careful
> auditing. When Christoph did this for the EH handlers, the first step was to
> simply move acquiring the host lock into the LLDs. That way we can optimize
> drivers one at a time after ensuring they can run lockless in their queuecommand
> handler.
>
> A couple examples of possible issues with drivers I'm familiar with (ibmvfc, ipr):
>
> * Some drivers will do list manipulation for resources needed to send commands. If
> done lockless, this could result in list corruption with multiple readers/writers.
>
> * Some drivers check the state of the hardware before sending a command. Failing to
> do this when the hardware is being reset may result in nasty things like PCI bus
> errors or even sending a command to the wrong device.
>

Indeed, I can very much see some older LLDs making these types of
assumptions.

> These are all the sorts of errors that will be very difficult to hit but have
> pretty bad consequence when they are hit.
>

I think pretty bad would be an under-statement when running into either
of the above two items in ancient LLD code.

In that case, I will re-spin a v4 series that contains a legacy
host_lock held w/ SHT->queuecomand() for all of the "historically high
host_lock optimized in ->queuecommand()" LLDs that are in RFCv3, and
include the other specific ones (namely mpt/fusion and mpt2sas) that we
know are safe to drop host_lock.

Many thanks for your invaluable input on some of the non-obvious items
at play here.

Best,

--nab


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/