RE: [PATCH v3 4/8] pciehp: Don't disable the link permanently,during removal

From: Rajat Jain
Date: Tue Jan 07 2014 - 13:21:02 EST


Hello Bjorn / Yinghai,

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx]
> Sent: Monday, January 06, 2014 4:04 PM
> To: Rajat Jain
> Cc: Rajat Jain; Kenji Kaneshige; Alex Williamson; Yijing Wang; linux-
> pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Yinghai Lu; Guenter
> Roeck; Rajat Jain; Yinghai Lu
> Subject: Re: [PATCH v3 4/8] pciehp: Don't disable the link permanently,
> during removal
>
> On Sun, Jan 5, 2014 at 10:53 AM, Rajat Jain <rajatxjain@xxxxxxxxx>
> wrote:
> > Hello Bjorn,
> >
> > Just checking on the fate of this patch set...
> >
> > On Tue, Dec 17, 2013 at 5:02 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> wrote:
> >> [+cc yinghai@xxxxxxxxxx (seems to be Yinghai's preferred email]
> >>
> >> On Tue, Dec 17, 2013 at 12:06:05PM -0800, Rajat Jain wrote:
> >>> We need future link up events for hot-add, thus don't disable the
> >>> link permanently during device removal. Also, remove the static
> >>> functions that are now left unused.
> >>
> >> The changelog should mention that this reverts part of 2debd9289997
> ("PCI:
> >> pciehp: Disable/enable link during slot power off/on").
> >
> > Sure. Do you want me to submit another patch set (bumping up the
> > version) with this change log, or you'd want to add this change log
> > while merging?
> >
> >>
> >> Yinghai, can you tell us whether this is an issue on your systems?
> >
> > As Yinghai confirms further down this thread, his issue was confirmed
> > by Intel to be a bug in the repeater chip.
> > ----------------------------------
> > Yinghai writes:
> >> According to HW guys and Intel, that should be bug of repeater.
> >>
> > ---------------------------------
> > I don't know about the details of his scenario, except that when the
> > adapter was disabled the repeater kept on flapping the link up & down
> > (and hence disabling the link solved the problem then). Yinghai
> > couldn't test, but I believe with this patch even if we disable
> > presence detect interrupt, the "adapter present / no present" messages
> > would (rightly) convert to "Link Up / Link Down" messages (since the
> > repeater keeps on flapping the link).
> >
> > Since it is a platform specific bug, I'm not sure what can be done to
> > remove those messages except may be reduce the verbosity? If you'd
> > like I could change all the INFO messages to DBG messages.
>
> Even if it's a defect in a particular piece of hardware, I don't want to
> regress on that hardware, even if the regression is just extra messages
> that we didn't see before.
>
> I think ideally we would add some sort of quirk for that hardware so it
> works just as well as it does today. I think extra messages will lead
> to a bug reports from users.

Sure, I can do that. I think what the quirk would have to do is that for that particular platform, don't enable the link-state based hotplug. (Since link-state hotplug will not work if we disable the link permanently as we do today on card removal).

But the question is how to determine that the quirk has to be applied? I think the objective is to apply the quirk to the platforms that have a "PCIe repeater". Since this does not depend on a PCI device / vendor ID, and I think the PCIe repeater is probably not even visible to the pciehp or the PCI subsystem, how do I determine that the quirk has to be applied?

If (hw_has_pcie_repeater)
Don't use link-state hotplug (and disable link permanently during card removal)
Else
Use link-state hotplug (and don't disable the link permanently)


Yinghai: Since I do not have that hardware, I will need some help in testing the patch with the quirk. I was wondering if you'd still have that hardware around and would be able to help me with testing?

Thanks,

Rajat
èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—