Re: [PATCH] Revert "scsi: mpt3sas: Fix secure erase premature termination"

From: James Bottomley
Date: Mon Jan 16 2017 - 09:24:23 EST


On Mon, 2017-01-16 at 10:22 +0100, Ingo Molnar wrote:
> * James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Sun, 2017-01-15 at 10:19 +0100, Ingo Molnar wrote:
> > > So there's a new mpt3sas SCSI driver boot regression, introduced
> > > in
> > > this merge window, which made one of my servers unbootable.
> >
> > We're not reverting a fix that would cause regressions for others.
>
> You really need to reconsider that stance ...
>
> > However, The fix was manifestly wrong, so does this fix of the fix
> > work for you:
> >
> > http://marc.info/?l=linux-scsi&m=148329237807604
> >
> > It's been languishing a bit because no-one seemed to care enough to
> > test or review it. IOf you can add a tested by, that will give the
> > two
> > we need to push it.
>
> I have tested your other patch that you pointed to:
>
> http://marc.info/?l=linux-scsi&m=148449968522828
>
> Which patch fixes the bug too (I removed my revert first) - so you
> can add my:
>
> Reported-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Tested-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks ... just checking you tested the second version with the
concurrency part?

> BTW., is it wise to work around the out of spec firmware in the
> mpt3sas code and leave the overly optimistic assumptions in the SCSI
> code intact? The problem is that other SCSI hardware could be
> affected as well - and especially enterprise class server hardware
> has long testing and thus regression latencies (as my example
> proves).

Realistically, there is no other card. Every other SAS implementation
uses the in-kernel SAT, which does the right thing. We've suggested on
a few occasions that the mpt SAS might like to use it as well, given we
keep tripping on SAT problems in their firmware.

> Wouldn't it be more robust to only submit one pass-through command at
> a time from the SCSI layer, and maybe opt-in hardware that is known
> to implement the SAT standard fully?

Unfortunately it's a lot more complex: the standard gives a queueing
mechanism for SAT pass through, so mostly you *can* have multiple
commands outstanding, so it looks like we shouldn't globally restrict
that. However, it seems the mpt3 firmware is using a queue single
command model *and* not doing the right thing with return codes hence
the failure. Since the failure mode is mpt3 specific, I think the best
place for the fix is in their code. We can revisit this decision if
something else comes along that also has this problem (UAS springs to
mind).

James


> (But I'm just kibitzing here really.)