Re: [RFC PATCH] PCI: avoid SBR for NVIDIA T4

From: Wu Zongyong
Date: Mon Apr 03 2023 - 00:02:30 EST


On Fri, Mar 31, 2023 at 10:11:15AM +0800, Wu Zongyong wrote:
> On Thu, Mar 30, 2023 at 10:49:26AM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 30, 2023 at 10:10:16AM +0800, Wu Zongyong wrote:
> > > On Wed, Mar 29, 2023 at 12:05:15PM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Mar 29, 2023 at 07:58:45PM +0800, Wu Zongyong wrote:
> > > > > Secondary bus reset will fail if NVIDIA T4 card is direct attached to a
> > > > > root port.
> > > >
> > > > Is this only a problem when direct attached to a Root Port? Why would
> > > > that be? If it's *not* related to being directly under a Root Port,
> > > > don't mention that at all.
> > >
> > > Yes, this problem occurs only when the T4 card is direct attached to a
> > > Root Port.
> > > I have test it with a T4 card attached to a PCIe Switch or a PCI Bridge,
> > > and it works well.
> >
> > From an electrical and protocol point of view, the device should not
> > be able to tell the difference, so Lukas' suggestion that it may be
> > related to reset delays seems very pertinent.
> I will test it with the commits mentioned above.
> But it may take some time since it is not easy to replace kernel in our
> environment.

I have tested it with Lukas' suggestion and it didn't work for T4 cards.

My base kernel is v5.10, and I cherry-picked the following patches:

730643d33e2d ("PCI/PM: Resume subordinate bus in bus type callbacks")
8ef0217227b4 ("PCI/PM: Observe reset delay irrespective of bridge_d3")
ac91e6980563 ("PCI: Unify delay handling for reset and resume")

Any other necessary patches I should apply?

> >
> > Bjorn