Re: "hpsa: Change SAS transport devices to bus 0." commit breaks hpacucli on old controller firmware

From: Hannes Reinecke
Date: Thu Nov 17 2016 - 06:17:38 EST


On 11/16/2016 05:09 PM, brace77070@xxxxxxxxx wrote:
> On 10/31/2016 02:06 PM, Don Brace wrote:
>> On 10/27/2016 01:15 PM, Jack Suter wrote:
>>> Hi there,
>>>
>>> Commit "hpsa: Change SAS transport devices to bus 0."
>>> (09371d623c9c3dc6ed7f53ec8ab01d25f0c6c697) breaks the hpacucli utility
>>> for some HP Smart Array controllers with old firmware.
>>>
>>> Specifically, I have a P410 connected to an HP DL180 G6 running firmware
>>> version 1.66. Yes, the firmware is old, but it works. On the 4.4 series
>>> kernels and earlier, hpacucli works with no trouble. On 4.5 and later,
>>> the hpsa driver reports errors in the kernel log, and hpacucli reports
>>> "Error: No controllers detected."
>>>
>>> Oct 27 15:50:30 hostname kernel: [ 32.189495] hpsa 0000:06:00.0: scsi
>>> 0:0:0:0: added RAID HP P410 controller
>>> SSDSmartPathCap- En- Exp=1
>>> Oct 27 15:50:30 hostname kernel: [ 32.190054] hpsa 0000:06:00.0:
>>> addition failed -19, device not added.
>>>
>>> Reverting the above commit resolves both the hpsa errors and the
>>> hpacucli error when tested with kernel 4.7.9.
>>>
>>> In addition to this troublesome server, I have a handful of servers with
>>> P410 controllers and firmware versions ranging from 3.52 to 6.60. All of
>>> them work with the 09371d62 commit in place, which leads me to believe
>>> it is just this old 1.66 firmware that is incompatible.
>>>
>>> While a firmware upgrade seems like the simple solution, I think this
>>> should be considered a bug/regression due to it breaking functionality
>>> that previously worked. It appears others may have run into this issue
>>> too:
>>> http://superuser.com/questions/1093124/coreos-hp410-raid1-device-not-added-19
>>>
>>>
>>> Some dmesg output (grep -e hpsa -e sg) is below from both a 4.4.2 kernel
>>> (working) and 4.5.7 kernel (broken). Note the change in SCSI address
>>> from 0:3:0:0 to 0:0:0:0.
>>>
>>> Please let me know if you need me to do any testing to help resolve
>>> this.
>>>
>>> Jack Suter
>> I discussed this with the ssacli developers and they do not look
>> at the bus, but I see "device not added" messages that
>> should not be there. I'll attack your issue from that
>> perspective.
>>
>> Thanks,
>> Don Brace
>
> The root cause is that this older firmware is not scsi revision 5 and
> thus we add the
> controller at the end of the list, not at the beginning.
>
> if (is_scsi_rev_5(h))
> raid_ctlr_position = 0;
> else
> raid_ctlr_position = nphysicals + nlogicals;
>
> So the first logical volume gets BTL 0:0:0 and then we attempt to add in
> the controller
> using the same BTL values at the end of the list. Thus you get the
> "device not added" messages.
>
> The change to bus 0 was because the SAS transport is using bus 0 and
> there was a
> discrepancy in what the driver was putting the controller on and what
> the SAS
> transport was actually using.
>
> I'll work on a patch to resolve this for you.
>
One can resolve this issue with checking the SCSI revision of the
controller, and move every controller with revision '0' to bus 3.
That solved the issue for me.

Patch posted in a different mail.

Cheers,

Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@xxxxxxx +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 NÃrnberg
GF: F. ImendÃrffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG NÃrnberg)