Re: [RFC PATCH] scsi: libsas: fix WARN on device removal

From: John Garry
Date: Tue Nov 22 2016 - 11:57:19 EST


On 21/11/2016 17:13, Dan Williams wrote:
On Mon, Nov 21, 2016 at 7:16 AM, John Garry <john.garry@xxxxxxxxxx> wrote:
@Maintainers, would you be willing to accept this patch as an interim
fix
for the dastardly WARN while we try to fix the flutter issue?



To me this adds a bug to quiet a benign, albeit noisy, warning.


What is the bug which is being added?


The bug where we queue a port teardown, but see a port formation event
in the meantime.


As I understand, this vulnerability already exists:
http://marc.info/?l=linux-scsi&m=143801026028006&w=2

I actually don't understand how libsas dealt with flutter (which I take to
mean a burst of up and down events) before these changes, as it can only
queue simultaneously one up and one down event per port. So, if we get a
flutter, then the events are lost and we get indeterminate state.


The events are not lost.

In sas_queue_event(), if there is a particular event pending for a port/PHY, we cannot queue further same event types for that port/PHY. I think my colleagues found issue where we try to enqueue multiple complementary events.

The new problem this patch introduces is
delaying sas port deletion where it was previously immediate. So now
we can get into a situation where the port has gone down and can start
processing a port up event before the previous deletion work has run.


And it's a very noisy warning, as in 6K lines on the console when an
expander is unplugged.


Does something like this modulate the failure?

I'm curious if we simply need to fix the double deletion of the
sas_port bsg queue, could you try the changes below?


No, I just tested it on a root port and we get the same WARN.


diff --git a/drivers/scsi/scsi_transport_sas.c
b/drivers/scsi/scsi_transport_sas.c index
60b651bfaa01..11401e5c88ba 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host
*shost, struct sas_rphy *rphy
{
struct request_queue *q;

- if (rphy)
+ if (rphy) {
q = rphy->q;
- else
+ rphy->q = NULL;
+ } else
q = to_sas_host_attrs(shost)->q;

if (!q)

.




.