On 10/14/2013 03:18 PM, Hannes Reinecke wrote:On 10/14/2013 02:51 PM, Steffen Maier wrote:On 10/14/2013 01:13 PM, Hannes Reinecke wrote:to work around suboptimal LUNOn 10/13/2013 07:23 PM, Vaughan Cao wrote:[1.] One line summary of the problem:Yes, this is correct. 'REPORT LUNS' is supported in 'Unavailable' state.
special sense code asc,ascq=04h,0Ch abort scsi scan in the middle
[2.] Full description of the problem/report:
For instance, storage represents 8 iscsi LUNs, however the LUN No.7
is not well configured or has something wrong.
Then messages received:
kernel: scsi 5:0:0:0: Unexpected response from lun 7 while scanning, scan aborted
Which will make LUN No.8 unavailable.
It's confirmed that Windows and Solaris systems will continue the
scan and make LUN No.1,2,3,4,5,6 and 8 available.
Log snippet is as below:
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: scsi scan: INQUIRY pass 1 length 36
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Send: 0xffff8801e9bd4280
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: CDB: Inquiry: 12 00 00 00 24 00
Aug 24 00:32:49 vmhodtest019 kernel: buffer = 0xffff8801f71fc180, bufflen = 36, queuecommand 0xffffffffa00b99e7
Aug 24 00:32:49 vmhodtest019 kernel: leaving scsi_dispatch_cmnd()
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Done: 0xffff8801e9bd4280 SUCCESS
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: CDB: Inquiry: 12 00 00 00 24 00
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Sense Key : Not Ready [current]
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: Add. Sense: Logical unit not accessible, target port in unavailable state
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:7: scsi host busy 1 failed 0
Aug 24 00:32:49 vmhodtest019 kernel: 0 sectors total, 36 bytes done.
Aug 24 00:32:49 vmhodtest019 kernel: scsi scan: INQUIRY failed with code 0x8000002
Aug 24 00:32:49 vmhodtest019 kernel: scsi 5:0:0:0: Unexpected response from lun 7 while scanning, scan aborted
According to scsi_report_lun_scan(), I found:
Linux use an inquiry command to probe a lun according to the result
of report_lun command.
It assumes every probe cmd will get a legal result. Otherwise, it
regards the whole peripheral not exist or dead.
If the return of inquiry passes its legal checking and indicates
'LUN not present', it won't break but also continue with the scan
process.
In the log, inquiry to LUN7 return a sense - asc,ascq=04h,0Ch
(Logical unit not accessible, target port in unavailable state).
And this is ignored, so scsi_probe_lun() returns -EIO and the scan
process is aborted.
I have two questions:
1. Is it correct for hardware to return a sense 04h,0Ch to inquiry
again, even after presenting this lun in responce to REPORT_LUN
command?
2. Since windows and solaris can continue scan, is it reasonable forHmm. Yes, and no.
linux to do the same, even for a fault-tolerance purpose?
_Actually_ this is an issue with the target, as it looks as if it
will return the above sense code while sending an 'INQUIRY' to the
device.
SPC explicitely states that the INQUIRY command should _not_ fail
for unavailable devices.
But yeah, we probably should work around this issues.
Nevertheless, please raise this issue with your array vendor.
Please try the attached patch.
Cheers,
Hannes
From b0e90778f012010c881f8bdc03bce63a36921b77 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@xxxxxxx>
Date: Mon, 14 Oct 2013 13:11:22 +0200
Subject: [PATCH] scsi_scan: continue report_lun_scan after error
When scsi_probe_and_add_lun() fails in scsi_report_lun_scan() this
does _not_ indicate that the entire target is done for.
So continue scanning for the remaining devices.
Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 307a811..973a121 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1484,13 +1484,12 @@ static int scsi_report_lun_scan(struct scsi_target *starget, int bflags,
lun, NULL, NULL, rescan, NULL);
if (res == SCSI_SCAN_NO_RESPONSE) {
/*
- * Got some results, but now none, abort.
+ * Got some results, but now none, ignore.
*/
sdev_printk(KERN_ERR, sdev,
"Unexpected response"
- " from lun %d while scanning, scan"
- " aborted\n", lun);
- break;
+ " from lun %d while scanning,"
+ " ignoring device\n", lun);
}
}
}
In LLDDs that do their own initiator based LUN masking (because the midlayer does not have this
functionality to enable hardware virtualization without NPIV, or
masking on the target), they are likely to return -ENXIO fromslave_alloc(), making scsi_alloc_sdev()return NULL, being converted to SCSI_SCAN_NO_RESPONSE byscsi_probe_and_add_lun() and thus goingthrough the same code path above.Ah. Hmm. Yes, they would.
However, I personally would question this approach, as SPC states that
The REPORT LUNS command (see table 284) requests the device
server to return the peripheral device logical unit inventory
accessible to the I_T nexus.
So by plain reading this would meant that you either should modify
'REPORT LUNS' to not show the masked LUNs, or set the pqual field to
'0x10' or '0x11' for those LUNs.
What about this patch:E.g. zfcp does return -ENXIO if the particular LUN was not made known to the unit whitelistmessage for the first LUN in
(via zfcp sysfs attribute unit_add).
If we attach LUN 0 (via unit_add) and trigger a target scan with SCAN_WILD_CARD for the scsi
lun (e.g. on remote port recovery), we see exactly above error
the response of report lun which is not explicitly attached to zfcp.single LUN reported by report lun but not
IIRC, other LLDDs such as bfa also do similar stuff [http://marc.info/?l=linux-scsi&m=134489842105383&w=2].
For those cases, I think it makes sense to abort scsi_report_lun_scan().
Otherwise we would force the LLDD to return -ENXIO for every
explicitly added to the LLDD LUN whitelist; and this would likely*flood kernel messages*.Well, as mentioned initially, the real issue is that the target
Maybe Vaughan's case needs to be distinguished in a patch.
aborts an INQUIRY while being in 'Unavailable'. Which, according to
SPC-3 (or later), is a violation of the spec.
So we _could_ just tell them to go away, but admittedly that's bad
style. Which means we'll have to implement a workaround; the above
was just a simple way of implementing it. If that's not working of
course we'll have to do something else.
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 973a121..01a7d69 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -594,6 +594,19 @@ static int scsi_probe_lun(struct scsi_device
*sdev, unsigne
d char *inq_result,
(sshdr.asc == 0x29)) &&
(sshdr.ascq == 0))
continue;
+ /*
+ * Some buggy implementations return
+ * 'target port in unavailable state'
+ * even on INQUIRY.
+ * Set peripheral qualifier 3
+ * for these devices.
+ */
+ if ((sshdr.sense_key == NOT_READY) &&
+ ((sshdr.asc == 0x04) &&
+ (sshdr.ascq == 0x0C))) {
+ inq_result[0] = 3 << 5;
+ return 0;
+ }
}
} else {
/*
(watchout, linebreaks mangled and all that).
Should be working for this particular case without interrupting
normal workflow, now should it not?