[PATCH 3/3] firewire: fw-sbp2: retry login if scsi_device wasofflined early

From: Stefan Richter
Date: Sat Jan 26 2008 - 11:45:21 EST


Fixes yet another "can't recognize device" bug.

https://bugzilla.redhat.com/show_bug.cgi?id=428554#c16 :
If a bus reset happens after the login and SCSI INQUIRY succeeded ---
but before scsi_driver.init_command finished ---, SCSI core would take
the brand new scsi_device offline already, leaving the SBP-2 target
inaccessible.

The proper fix would be to allow sbp2_reconnect to happen in parallel to
__scsi_add_device. This involves intrusive changes to fw-sbp2. Until
then, we use the following simple workaround: Check if the new sdev is
offline; if so, remove the device, logout, and let another login attempt
happen.

Signed-off-by: Stefan Richter <stefanr@xxxxxxxxxxxxxxxxx>
---

Depends on patch 2/3. Has yet to be tested by the Fedora bug reporter.

drivers/firewire/fw-sbp2.c | 40 +++++++++++++++++++++++++++++--------
1 file changed, 32 insertions(+), 8 deletions(-)

Index: linux/drivers/firewire/fw-sbp2.c
===================================================================
--- linux.orig/drivers/firewire/fw-sbp2.c
+++ linux/drivers/firewire/fw-sbp2.c
@@ -716,21 +716,45 @@ static void sbp2_login(struct work_struc
sdev = __scsi_add_device(shost, 0, 0,
scsilun_to_int(&eight_bytes_lun), lu);
if (IS_ERR(sdev)) {
- smp_rmb(); /* generation may have changed */
- generation = device->generation;
- smp_rmb(); /* node_id must not be older than generation */
+ /*
+ * The most frequent cause for __scsi_add_device() to fail
+ * is a bus reset while sending the SCSI INQUIRY. Try again.
+ */
+ goto out_logout_login;

- sbp2_send_management_orb(lu, device->node_id, generation,
- SBP2_LOGOUT_REQUEST, lu->login_id, NULL);
+ } else if (sdev->sdev_state == SDEV_OFFLINE) {
/*
- * Set this back to sbp2_login so we fall back and
- * retry login on bus reset.
+ * FIXME: We are unable to perform reconnects while in
+ * sbp2_login(). Therefore __scsi_add_device() will get
+ * into trouble if a bus reset happens in parallel.
+ * It will either fail (that's OK, see above) or take sdev
+ * offline. Here is a crude workaround for the latter.
*/
- PREPARE_DELAYED_WORK(&lu->work, sbp2_login);
+ scsi_device_put(sdev);
+ scsi_remove_device(sdev);
+ goto out_logout_login;
+
} else {
+ /*
+ * Can you believe it? Everything went well.
+ */
lu->sdev = sdev;
scsi_device_put(sdev);
+ goto out;
}
+
+ out_logout_login:
+ smp_rmb(); /* generation may have changed */
+ generation = device->generation;
+ smp_rmb(); /* node_id must not be older than generation */
+
+ sbp2_send_management_orb(lu, device->node_id, generation,
+ SBP2_LOGOUT_REQUEST, lu->login_id, NULL);
+ /*
+ * If a bus reset happened, sbp2_update will have requeued
+ * lu->work already. Reset the work from reconnect to login.
+ */
+ PREPARE_DELAYED_WORK(&lu->work, sbp2_login);
out:
sbp2_target_put(lu->tgt);
}

--
Stefan Richter
-=====-==--- ---= ==-=-
http://arcgraph.de/sr/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/