RE: [PATCH RESEND 1/2] Drivers: scsi: storvsc: Set the scsi resultcorrectly when SRB status is INVALID

From: KY Srinivasan
Date: Tue Mar 20 2012 - 10:42:55 EST




> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@xxxxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, March 20, 2012 4:52 AM
> To: KY Srinivasan
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> devel@xxxxxxxxxxxxxxxxxxxxxx; ohering@xxxxxxxx; hch@xxxxxxxxxxxxx; linux-
> scsi@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH RESEND 1/2] Drivers: scsi: storvsc: Set the scsi result correctly
> when SRB status is INVALID
>
> On Mon, 2012-03-19 at 22:52 +0000, KY Srinivasan wrote:
> > > > However, keep in mind
> > > > that there is no current ETA on when Windows will ship with these changes
> -
> > > Windows 8
> > > > may ship with code where they would return an invalid SRB status, but they
> are
> > > not
> > > > setting the sense code, hence this patch. When the Window host does the
> > > "right thing"
> > > > I will clean this up, but I don't know when that will be.
> > >
> > > I thought you just said you'd only just asked them if they could
> > > implemented it, in which case no version of windows currently ships with
> > > this, correct?
> >
> > There are some private builds of windows 8 floating around with this change,
> where
> > they are returning ILLEGAL_REQUEST SRB status without any sense data.
>
> Sure, but they're not shipped, right ... it's like the test builds we do
> for large companies like IBM and HP to try out certain things before
> deciding they don't work.

They are close to shipping and it is very difficult to get any changes in
presently. Furthermore, this is only on windows8; none of the prior
versions of windows servers of interest support this. I am starting an effort to
get this change into prior windows servers. Once again, it is not clear when
these changes will be pushed out.


>
> > > > More importantly, the second patch in this series where I filter out
> > > > the ATA_16 command
> > > > on the guest is really important for us. Without that patch on a range
> > > > on windows hosts
> > > > including the current beta version of windows8 where the host is
> > > > returning a generic
> > > > error in response to ATA_16 command, we cannot boot many Linux
> > > > distros. If you
> > > > prefer, I can drop the first patch and re-submit the second patch for
> > > > consideration now.
> > >
> > > I'm not sure about that either. You presumably translate
> > > SRB_STATUS_ERROR into DID_TARGET_FAILURE. That should cause the
> > > termination of the command with prejudice in exactly the same way as an
> > > ILLEGAL_REQUEST sense code would (minus the useful error information),
> > > so what's causing the boot failure?
> >
> > You are right, currently without a proper SRB code, I do a DID_TARGET_FAILURE
> and
> > this results in the device being offlined and if the device happens to be the root
> device,
> > we obviously cannot boot. I have seen this problem with sles11 sp2 on a win8
> box.
>
> OK, so this may be the root cause of the problem. DID_TARGET_FAILURE
> returns FAILED from scsi_decide_disposition(). This wakes up the error
> handler to retry the command and, since the command is never going to
> work, this ends up offlining the device. The same thing will happen for
> every command with no recovery.
>
> The question now is, what else returns SRB_STATUS_ERROR? If it's always
> for stuff that's unretryable, then the DID_ error is wrong and you
> should be returning DID_PASSTHROUGH with an error code and the problem
> will be solved. If we can get SRB_STATUS_ERROR on retryable commands,
> then you discriminate at the point of failure, not at the point of input
> and return DID_TARGET_FAILURE for the ones that should be retried and
> DID_PASSTHROUGH + error for the ones that shouldn't. This will ensure
> the driver is completely backwards compatible and that it will work
> if/when windows properly handles the commands.

James, unfortunately based on the current SRB codes I get back from the
host, I don't know which commands that failed ought to be retried and which
ones should not be; I simply get a single SRB error code for cases where the
host filtered the unsupported commands as well as the case where the host
supported the command and something failed in the command execution.
If there is something I can try in this driver to fix this problem, I am more than
happy to try it. If it involves getting changes into the host (win8, win2k8 etc.),
I am willing to start a conversation with the relevant teams, but I cannot
obviously determine when such changes will ship. However, I do need
solution for the problem now.

I appreciate your taking the time to help me gravitate towards the
correct solution here. Given my constraints, let me know what is the
best way forward here.

Regards,

K. Y

èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—