Re: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset

From: Dennis Dalessandro
Date: Fri Apr 20 2018 - 10:12:40 EST


On 4/19/2018 6:11 PM, Deucher, Alexander wrote:
-----Original Message-----
From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
Sent: Thursday, April 19, 2018 5:47 PM
To: Jason Gunthorpe <jgg@xxxxxxxx>
Cc: Sinan Kaya <okaya@xxxxxxxxxxxxxx>; Bjorn Helgaas
<bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx;
sulrich@xxxxxxxxxxxxxx; timur@xxxxxxxxxxxxxx; linux-arm-
msm@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; Mike
Marciniszyn <mike.marciniszyn@xxxxxxxxx>; Dennis Dalessandro
<dennis.dalessandro@xxxxxxxxx>; Doug Ledford <dledford@xxxxxxxxxx>;
open list:HFI1 DRIVER <linux-rdma@xxxxxxxxxxxxxxx>; open list <linux-
kernel@xxxxxxxxxxxxxxx>; Deucher, Alexander
<Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH 1/2] IB/hfi1: Try slot reset before secondary bus reset

[+cc Alex, who might know why DRM drivers have their own PCIe Gen3
code]

On Thu, Apr 19, 2018 at 02:26:32PM -0600, Jason Gunthorpe wrote:
On Thu, Apr 19, 2018 at 03:56:23PM -0400, Sinan Kaya wrote:
The infiniband adapter might be connected to a PCI hotplug slot.
Performing secondary bus reset on a hotplug slot causes PCI link
up/down interrupts.

Hotplug driver removes the device from system when a link down
interrupt is observed and performs re-enumeration when link up
interrupt is observed.

This conflicts with what this code is trying to do. Try secondary
bus reset only if pci_reset_slot() fails/unsupported.

Signed-off-by: Sinan Kaya <okaya@xxxxxxxxxxxxxx>
drivers/infiniband/hw/hfi1/pcie.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hfi1/pcie.c
b/drivers/infiniband/hw/hfi1/pcie.c
index 83d66e8..75f49e3 100644
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -908,7 +908,8 @@ static int trigger_sbr(struct hfi1_devdata *dd)

The code above this hunk is:

/*
* Trigger a secondary bus reset (SBR) on ourselves using our parent.
*
* Based on pci_parent_bus_reset() which is not exported by the
* kernel core.
*/
static int trigger_sbr(struct hfi1_devdata *dd) {

[..]

This really seems like something the PCI core should be helping with,
drivers shouldn't be doing stuff like this. I get the feeling this
should be a common need if drivers support various error recovery
schemes?

Bjorn, would be appropriate to export pci_parent_bus_reset() or some
variation therin??

I agree it would be really nice if the PCI core could help out somehow so we
could get some of this code out of individual drivers.

If fact, stepping back a few paces, this HFI reset path is part of a transition to
PCIe gen3 signaling, and I'm not sure why *that* is in the driver either.

There's an ongoing discussion [1] about why this gen3 code is in the driver.
Several DRM drivers include similar code (cik_pcie_gen3_enable(),
si_pcie_gen3_enable()).

I *thought* the hardware was supposed to automatically negotiate to the
highest rate supported by both sides without any help at all from software.
But since several drivers have code to do it themselves, I wonder if I'm
missing something, or maybe there's something the PCI core should be doing
that it isn't, and the driver code is basically working around that PCI core
deficiency.

My understanding was that some platfoms only bring up the link in gen 1 mode for compatibility reasons. TBH, I'm not that familiar with how the links come up on different platforms.

Alex


I'm checking with our HW folks and the guys that wrote the PCI code in our driver. I quite frankly just don't recall what all is going on here. Let me find out for sure before I give out wrong information.

-Denny