Re: [PATCH 9/9] firewire: fw-sbp2: fix I/O errors during reconnect

From: Jarod Wilson
Date: Mon Feb 11 2008 - 13:10:28 EST


Stefan Richter wrote:
While fw-sbp2 takes the necessary time to reconnect to a logical unit
after bus reset, the SCSI core keeps sending new commands. They are all
immediately completed with host busy status, and application clients or
filesystems will break quickly. The SCSI device might even be taken
offline: http://bugzilla.kernel.org/show_bug.cgi?id=9734

The only remedy seems to be to block the SCSI device until reconnect.
Alas the SCSI core has no useful API to block only one logical unit i.e.
the scsi_device, therefore we block the entire Scsi_Host. This
currently corresponds to an SBP-2 target. In case of targets with
multiple logical units, we need to satisfy the dependencies between
logical units by carefully tracking the blocking state of the target and
its units. We block all logical units of a target as soon as one of
them needs to be blocked, and keep them blocked until all of them are
ready to be unblocked.

Furthermore, as the history of the old sbp2 driver has shown, the
scsi_block_requests() API is a minefield with high potential of
deadlocks. We therefore take extra measures to keep logical units
unblocked during __scsi_add_device() and during shutdown.

Signed-off-by: Stefan Richter <stefanr@xxxxxxxxxxxxxxxxx>

+/*
+ * Blocks lu->tgt if all of the following conditions are met:
+ * - Login, INQUIRY, and high-level SCSI setup of all logical units of the
+ * target have been successfully finished (indicated by dont_block == 0).
+ * - The lu->generation is stale. sbp2_reconnect will unblock lu later.
+ */
+static void sbp2_conditionally_block(struct sbp2_logical_unit *lu)
+{
+ struct fw_card *card = fw_device(lu->tgt->unit->device.parent)->card;
+
+ if (!atomic_read(&lu->tgt->dont_block) &&
+ lu->generation != card->generation &&
+ atomic_cmpxchg(&lu->blocked, 0, 1) == 0) {

Just to be absolutely sure, we don't need any barriers here to ensure we get the right generations, do we?

Also, this isn't expected to let I/O survive a disk being unplugged briefly, then plugged back in, is it? (I recall that being discussed, but I think it was as a 'would be nice to do in the future' thing).

--
Jarod Wilson
jwilson@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/