RE: How long should be PCIe card in Warm Reset state?

From: David Laight
Date: Thu Mar 25 2021 - 05:19:26 EST


From: Amey Narkhede
> Sent: 23 March 2021 16:58
>
> On 21/03/23 05:27PM, Pali Rohár wrote:
> > On Tuesday 23 March 2021 21:49:41 Amey Narkhede wrote:
> > > On 21/03/10 12:05PM, Pali Rohár wrote:
> > > > Hello!
> > > >
> > > > I would like to open a question about PCIe Warm Reset. Warm Reset of
> > > > PCIe card is triggered by asserting PERST# signal and in most cases
> > > > PERST# signal is controlled by GPIO.
> > > >
> > > > Basically every native Linux PCIe controller driver is doing this Warm
> > > > Reset of connected PCIe card during native driver initialization
> > > > procedure.
> > > >
> > > > And now the important question is: How long should be PCIe card in Warm
> > > > Reset state? After which timeout can be PERST# signal de-asserted by
> > > > Linux controller driver?
> > > >
> > > > Lorenzo and Rob already expressed concerns [1] [2] that this Warm Reset
> > > > timeout should not be driver specific and I agree with them.
> > > >
> > > > I have done investigation which timeout is using which native PCIe
> > > > driver [3] and basically every driver is using different timeout.
> > > >
> > > > I have tried to find timeouts in PCIe specifications, I was not able to
> > > > understand and deduce correct timeout value for Warm Reset from PCIe
> > > > specifications. What I have found is written in my email [4].
> > > >
> > > > Alex (as a "reset expert"), could you look at this issue?
> > > >
> > > > Or is there somebody else who understand PCIe specifications and PCIe
> > > > diagrams to figure out what is the minimal timeout for de-asserting
> > > > PERST# signal?
> > > >
> > > > There are still some issues with WiFi cards (e.g. Compex one) which
> > > > sometimes do not appear on PCIe bus. And based on these "reset timeout
> > > > differences" in Linux PCIe controller drivers, I suspect that it is not
> > > > (only) the problems in WiFi cards but also in Linux PCIe controller
> > > > drivers. In my email [3] I have written that I figured out that WLE1216
> > > > card needs to be in Warm Reset state for at least 10ms, otherwise card
> > > > is not detected.
> > > >
> > > > [1] - https://lore.kernel.org/linux-pci/20200513115940.fiemtnxfqcyqo6ik@pali/
> > > > [2] - https://lore.kernel.org/linux-pci/20200507212002.GA32182@bogus/
> > > > [3] - https://lore.kernel.org/linux-pci/20200424092546.25p3hdtkehohe3xw@pali/
> > > > [4] - https://lore.kernel.org/linux-pci/20200430082245.xblvb7xeamm4e336@pali/
> > >
> > > I somehow got my hands on PCIe Gen4 spec. It says on page no 555-
> > > "When PERST# is provided to a component or adapter, this signal must be
> > > used by the component or adapter as Fundamental Reset.
> > > When PERST# is not provided to a component or adapter, Fundamental Reset is
> > > generated autonomously by the component or adapter, and the details of how
> > > this is done are outside the scope of this document."
> > > Not sure what component/adapter means in this context.
> > >
> > > Then below it says-
> > > "In some cases, it may be possible for the Fundamental Reset mechanism
> > > to be triggered by hardware without the removal and re-application of
> > > power to the component. This is called a warm reset. This document does
> > > not specify a means for generating a warm reset."
> > >
> > > Thanks,
> > > Amey
> >
> > Hello Amey, PCIe Base document does not specify how to control PERST#
> > signal and how to issue Warm Reset. But it is documented in PCIe CEM,
> > Mini PCIe CEM and M.2 CEM documents (maybe in some other PCIe docs too).
> >
> > It is needed look into more documents, "merge them in head" and then
> > deduce final meaning...
> Okay so PCIe CEM revision 2.0(from 2007) on page no 22 says-
> "On power up, the deassertion of PERST# is delayed 100 ms (TPVPERL)
> from the power rails achieving specified operating limits. Also, within
> this time, the reference clocks (REFCLK+, REFCLK-) also become stable,
> at least TPERST-CLK before PERST# is deasserted."
>
> Then below it says-
> "After there has been time (TPVPERL) for the power and clock to become
> stable, PERST# is deasserted high and the PCI Express functions can start
> up."
>
> And then there is table of timing on page no 33-
> Symbol Parameter Min
> TPVPERL Power Stable to PERST# inactive 100ms
> TPERST-CLK REFCLK stable before PERST# inactive 100μs
> TPERST PERST# active time 100μs
> TFAIL Power level invalid to PERST# active 500ns
> ...
>
> I agree this is confusing.

There is also the related issue of the time after reset is removed
before the target must respond to the first configuration cycle.

I can't see the value in the (nice bound) copy of the PCI 2.0 spec I have.
But IIRC it is 100ms (it might just me 500ms).
While this might seem like ages it can be problematic if targets have
to load large FPGA images from serial EEPROMs.

Most x86 systems have lots of slow bios code so tend to be fine.
But some other systems can try to enumerate the PCIe target before
it is actually ready - causing semi-random failures.

Our current fpgas do load the pcie interface before most of the
logic, and can be configured to force cycle reruns on the config
space accesses until fully loaded.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)