RE: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

From: Hongxing Zhu

Date: Tue Apr 14 2026 - 04:02:20 EST

> -----Original Message-----
> From: Manivannan Sadhasivam <mani@xxxxxxxxxx>
> Sent: 2026年4月14日 0:35
> To: Brian Norris <briannorris@xxxxxxxxxxxx>; Hongxing Zhu
> <hongxing.zhu@xxxxxxx>
> Cc: Hongxing Zhu <hongxing.zhu@xxxxxxx>;
> manivannan.sadhasivam@xxxxxxxxxxxxxxxx; Bjorn Helgaas
> <bhelgaas@xxxxxxxxxx>; Mahesh J Salgaonkar <mahesh@xxxxxxxxxxxxx>;
> Oliver O'Halloran <oohall@xxxxxxxxx>; Will Deacon <will@xxxxxxxxxx>;
> Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx>; Krzysztof Wilczyński
> <kwilczynski@xxxxxxxxxx>; Rob Herring <robh@xxxxxxxxxx>; Heiko Stuebner
> <heiko@xxxxxxxxx>; Philipp Zabel <p.zabel@xxxxxxxxxxxxxx>;
> linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> linuxppc-dev@xxxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> linux-arm-msm@xxxxxxxxxxxxxxx; linux-rockchip@xxxxxxxxxxxxxxxxxxx; Niklas
> Cassel <cassel@xxxxxxxxxx>; Wilfred Mallawa <wilfred.mallawa@xxxxxxx>;
> Krishna Chaitanya Chundru <krishna.chundru@xxxxxxxxxxxxxxxx>; Lukas
> Wunner <lukas@xxxxxxxxx>; Wilson Ding <dingwei@xxxxxxxxxxx>; Miles
> Chen <minhuachen@xxxxxxxxxx>
> Subject: Re: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a
> platform specific way
>
> Hi Brian,
>
> On Wed, Apr 08, 2026 at 06:58:34PM -0700, Brian Norris wrote:
> > Hi Richard and Mani,
> >
> > For the record, I've been using a form of an earlier version of this
> > patchset in my environment for some time now, and I've run across
> > problems that *might* relate to what Richard is reporting, but I'm not
> > quite sure at the moment. Details below.
> >
> > On Wed, Mar 25, 2026 at 07:06:49AM +0000, Hongxing Zhu wrote:
> > > Hi Mani:
> > > I've accidentally encountered a new issue based on the reset root port
> patch-set.
> > > After performing a few hot-reset operations, the PCIe link enters a
> continuous up/down cycling pattern.
> > >
> > > I found that calling pci_reset_secondary_bus() first in
> pcibios_reset_secondary_bus() appears to resolve this issue.
> > > Have you experienced a similar problem?
> > >
> > > "
> > > ...
> > > [ 141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700)
> > > link up detected [ 142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3
> > > x1 link up [ 142.092038] imx6q-pcie 4c300000.pcie:
> > > PCIe(LNK_STS:0x00000c00) link down detected ...
> > > "
> > >
> > > Platform: i.MX95 EVK board plus local Root Ports reset supports based
> on the #1 and #2 patches of v7 patch-set.
> > > Notes of the logs:
> > > - One Gen3 NVME device is connected.
> > > - "./memtool 4c341058=0;./memtool 4c341058=1;" is used to toggle the
> LTSSM_EN bit to trigger the link down.
> > > - Toggle BIT6 of Bridge Control Register to trigger hot reset by
> "./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff;"
> > > - The Root Port reset patches works correctly at first.
> > > However, after several hot-reset triggers, the link enters a repeated
> down/up cycling state.
> > >
> > > Logs:
> > > [ 3.553188] imx6q-pcie 4c300000.pcie: host bridge
> /soc/pcie@4c300000 ranges:
> > > [ 3.560308] imx6q-pcie 4c300000.pcie: IO
> 0x006ff00000..0x006fffffff -> 0x0000000000
> > > [ 3.568525] imx6q-pcie 4c300000.pcie: MEM
> 0x0910000000..0x091fffffff -> 0x0010000000
> > > [ 3.577314] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 ==
> cpu 0x60100000
> > > [ 3.796029] imx6q-pcie 4c300000.pcie: iATU: unroll T, 128 ob, 128 ib,
> align 4K, limit 1024G
> > > [ 4.003746] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> > > [ 4.009553] imx6q-pcie 4c300000.pcie: PCI host bridge to bus 0000:00
> > > root@imx95evk:~#
> > > root@imx95evk:~#
> > > root@imx95evk:~# ./memtool 4c341058=0;./memtool 4c341058=1;
> Writing
> > > 32-bit value 0x0 to address 0x4C341058 Writing 32-bit v
> > > [ 87.265348] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d01)
> link down detected
> > > alue 0x1 to adder
> > > [ 87.273106] imx6q-pcie 4c300000.pcie: Stop root bus and handle link
> down
> > > ss 0x4C341058
> > > [ 87.281264] pcieport 0000:00:00.0: Recovering Root Port due to Link
> Down
> > > [ 87.289245] pci 0000:01:00.0: AER: can't recover (no error_detected
> callback)
> > > root@imx95evk:~# [ 87.514216] imx6q-pcie 4c300000.pcie:
> PCIe(LNK_STS:0x00000700) link up detected
> > > [ 87.702968] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up
> > > [ 87.834983] pcieport 0000:00:00.0: Root Port has been reset
> > > [ 87.840714] pcieport 0000:00:00.0: AER: device recovery failed
> > > [ 87.846592] imx6q-pcie 4c300000.pcie: Rescan bus after link up is
> detected
> > > [ 87.855947] pcieport 0000:00:00.0: bridge configuration invalid ([bus
> 00-00]), reconfiguring
> >
> > I've seen this same line ("bridge configuration invalid") before, and
> > I believe that's because the saved state (pci_save_state(); more about
> > this below) is invalid -- it contains 0 values in places where they
> > should be non-zero. So when those values are restored
> > (pci_restore_state()), we get confused.
> >
> > I believe we've pinned down one reason this invalid state occurs --
> > it's because of an automatic (mis)feature in the DesignWare PCIe
> hardware.
> > Specifically, it's because of what the controller does during a
> > surprise link-down error.
> >
> > From the Designware docs:
> >
> > "[...] during normal operation, the link might fail and go down. After
> > this link-down event, the controller requests the DWC_pcie_clkrst.v
> > module to hot-reset the controller. There is no difference in the
> > handling of a link-down reset or a hot reset; the controller asserts
> > the link_req_rst_not output requesting the DWC_pcie_clkrst.v module
> to
> > reset the controller."
> >
> > In some of the adjacent documentation (and confirmed in local
> > testing), it suggests that this automatic reset will also reset
> > various DBI (i.e., PCIe config space) registers. It also seems as if
> > there's not really a good way to completely stop this automatic reset
> > -- the docs mention some SW methods prevent the reset, but they all
> seem racy or incomplete.
> >
> > Anyway, I think this implies that patch 1 is somewhat wrong [1]. It
> > includes some code like this:
> >
> > pci_save_state(dev);
> > ret = host->reset_root_port(host, dev);
> > if (ret)
> > pci_err(dev, "Failed to reset Root Port: %d\n", ret);
> > else
> > /* Now restore it on success */
> > pci_restore_state(dev);
> >
> > That first line (pci_save_state()) is prone to saving invalid state,
> > depending on whether the link-down event has finished flushing and
> > resetting the controller yet or not. The resulting impact is a bit
> > hard to judge, depending on what (mis)configuration you end up with.
> >
>
> Thanks a lot for your investigation. I think your observation makes sense and
> could be the culprit in saving the corrupted state. Even on non-DWC
> controllers, there is no guarantee that the Root Port config registers state will
> be preserved after LDn (before Root Port reset).
>
> > I also noticed commit a2f1e22390ac ("PCI/ERR: Ensure error
> > recoverability at all times") was merged recently. With that change, I
> > believe it is now safe to perform pci_restore_state() even without
> > pci_save_state() here.
> >
> > So ... can we remove pci_save_state() from
> > pcibios_reset_secondary_bus()? Might that help?
>
> I think so. I will also test it locally and report back soon.
>
> > It sounds like my above
> > observations *may* match Richard's reports, but I'm not sure. And
> > anyway, the documented hardware behavior is racy, so it's hard to
> > propose a foolproof solution.
> >
>
> @Richard: Can you confirm if removing 'pci_save_state(dev);' from
> pcibios_reset_secondary_bus() fixes your issue?
I have tested the hot reset trigger hundreds of times, and it works
consistently.
@Brian Norris @Manivannan Sadhasivam
Thanks a lot for your helps.

Best Regards
Richard Zhu
>
> - Mani
>
> --
> மணிவண்ணன் சதாசிவம்