Re: [EXT] VPD access Blocked by commit 0d5370d1d85251e5893ab7c90a429464de2e140b

From: Himanshu Madhani
Date: Thu May 30 2019 - 15:37:01 EST


Hi Bjorn,

Thank you for responding to my question.

> On May 21, 2019, at 1:11 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>
> External Email
>
> ----------------------------------------------------------------------
> [fix linux-pci, remove ethan.zhao (bounces)]
>
> From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Date: Tue, May 21, 2019 at 3:02 PM
> To: Himanshu Madhani
> Cc: ethan.zhao@xxxxxxxxxx, Andrew Vasquez, Girish Basrur, Giridhar
> Malavali, Myron Stowe, <linux-pci@xxxxxxxxxx>, Linux Kernel Mailing
> List, Quinn Tran
>
>> [+cc Myron, Quinn, linux-pci, linux-kernel]
>>
>> From: Himanshu Madhani <hmadhani@xxxxxxxxxxx>
>> Date: Fri, May 17, 2019 at 5:21 PM
>> To: ethan.zhao@xxxxxxxxxx, bhelgaas@xxxxxxxxxx
>> Cc: Andrew Vasquez, Girish Basrur, Giridhar Malavali
>>
>>> Hi Ethan,
>>>
>>> Our OEM partners reported to us that VPD access with latest distros were returning I/O error for them. They indicated this to be issue only with newer kernels.
>>>
>>> One of the distro vendor pointed out patch posted by you to be reason for IO error trying to VPD. The patch looks like blocks access to VPD by blacklisting ISP.
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d5370d1d85251e5893ab7c90a429464de2e140bï;
>>>
>>> I setup PCIe analyzer to reproduce this in our lab to root cause it and discovered that after reverting the patch. I am able to get VPD data okay with upstream 5.1.0 and I used RHEL8.
>>>
>>> I also used "lspci" and "cat" to dump out VPD data and do not see any issue.
>>>
>>> # lspci -vvv -s 03:00.0
>>> 03:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
>>> Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre Channel to PCIe Adapter
>>> Physical Slot: 15
>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>> Latency: 0, Cache Line Size: 64 bytes
>>> Interrupt: pin A routed to IRQ 67
>>> NUMA node: 0
>>> Region 0: Memory at fbe05000 (64-bit, prefetchable) [size=4K]
>>> Region 2: Memory at fbe02000 (64-bit, prefetchable) [size=8K]
>>> Region 4: Memory at fbd00000 (64-bit, prefetchable) [size=1M]
>>> Expansion ROM at fb540000 [disabled] [size=256K]
>>> Capabilities: [44] Power Management version 3
>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>> Capabilities: [4c] Express (v2) Endpoint, MSI 00
>>> DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>>> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>>> MaxPayload 256 bytes, MaxReadReq 4096 bytes
>>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>> LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
>>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>> LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>> DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>>> AtomicOpsCtl: ReqEn-
>>> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>>> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>>> Compliance De-emphasis: -6dB
>>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
>>> EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>>> Capabilities: [88] Vital Product Data
>>> Product Name: QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter
>>> Read-only fields:
>>> [PN] Part number: QLE2742
>>> [SN] Serial number: RFD1706R22611
>>> [EC] Engineering changes: BK3210408-05 04
>>> [V9] Vendor specific: 010189
>>> [RV] Reserved: checksum good, 0 byte(s) reserved
>>> End
>>> Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
>>> Vector table: BAR=2 offset=00000000
>>> PBA: BAR=2 offset=00001000
>>> Capabilities: [9c] Vendor Specific Information: Len=0c <?>
>>> Capabilities: [100 v1] Advanced Error Reporting
>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>> HeaderLog: 00000000 00000000 00000000 00000000
>>> Capabilities: [154 v1] Alternative Routing-ID Interpretation (ARI)
>>> ARICap: MFVC- ACS-, Next Function: 1
>>> ARICtl: MFVC- ACS-, Function Group: 0
>>> Capabilities: [1c0 v1] #19
>>> Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
>>> Kernel driver in use: qla2xxx
>>> Kernel modules: qla2xxx
>>>
>>> # cat /sys/bus/pci/devices/0000\:03\:00.0/vpd
>>> RFD1706R22611ECBK3210408-05 04V9010189RVïx
>>>
>>> Can you share some more insight into where you encountered issue? I am in process of reverting this patch from upstream kernel but wanted to reach out and find out if you still have setup to provide more context.
>>
>> 0d5370d1d852 ("PCI: Prevent VPD access for QLogic ISP2722") prevented
>> a panic while reading VPD, so we can't simply revert it.
>>
>> Since you don't see a panic while reading VPD from that device, it's
>> possible that a QLogic firmware change fixed the VPD format so Linux
>> no longer reads the area that caused the problem. Or possibly your
>> system doesn't handle the config read error the same way Ethan's HP
>> DL380 does. Unfortunately we don't have an actual PCIe analyzer trace
>> from Ethan's system, so we don't know exactly what happened on PCIe.
>>
>> I suggest that you capture the entire VPD area and hexdump it, e.g.,
>> with "xxd", and look at its structure. pci_vpd_size() parses it and
>> computes the valid size based on a PCI_VPD_STIN_END tag, and
>> pci_vpd_read() should not read past that size.
>>
>> And you *do* have an analyzer trace. If new QLogic firmware fixed the
>> VPD format, the trace should show that Linux read only the valid part
>> of VPD, and there should be no errors in the trace. Then it might
>> just be a question of tweaking the quirk so it allows VPD reads if the
>> firmware is new enough.
>>
>> But if the trace does show config reads with errors, then it might be
>> that your system just tolerates the errors while the DL380 did not.
>> Then we'd have to figure out exactly what the error was and how to
>> deal with it so things work on both your system and Ethan's.
>>
>> Bjorn


I have verified after reverting patch on HP DL380 Gen10 hardware by capturing PCIe trace

Hereâs snippet from dmidecode output

System Information:
Manufacturer: HPE
Product Name: ProLiant DL380 Gen10
Family: ProLiant

We are able to successfully read VPD config data using lspci and cat command

# lspci -d1077:
13:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
13:00.1 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
[root@R8 ~]# lspci -vvv -s 13:00.0
13:00.0 Fibre Channel: QLogic Corp. ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter (rev 01)
Subsystem: QLogic Corp. QLE2742 Dual Port 32Gb Fibre Channel to PCIe Adapter
Physical Slot: 3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 28
NUMA node: 0
Region 0: Memory at de605000 (64-bit, prefetchable) [size=4K]
Region 2: Memory at de602000 (64-bit, prefetchable) [size=8K]
Region 4: Memory at de500000 (64-bit, prefetchable) [size=1M]
[virtual] Expansion ROM at de800000 [disabled] [size=256K]
Capabilities: [44] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [4c] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <4us, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+, LTR-, OBFF Not Supported
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [88] Vital Product Data
Product Name: QLogic 32Gb 2-port FC to PCIe Gen3 x8 Adapter
Read-only fields:
[PN] Part number: QLE2742
[SN] Serial number: AFD1533Y02999
[EC] Engineering changes: BK3210407-05 03
[V9] Vendor specific: 010189
[RV] Reserved: checksum good, 0 byte(s) reserved
End
Capabilities: [90] MSI-X: Enable+ Count=16 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00001000
Capabilities: [9c] Vendor Specific Information: Len=0c <?>
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [154 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [1c0 v1] #19
Capabilities: [1f4 v1] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
Kernel driver in use: qla2xxx
Kernel modules: qla2xxx

We also verified this same configuration on a SuperMicro X10SRA-F server (which i had sent in earlier email)â
and were able to verify that the VPD read was good and there were no errors on PCIe trace. Given this information, Please consider reverting the patch until we further debug the issue and resolve as it is
affecting general availability of our adapter.

Thanks,
Himanshu