Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

From: Vincenzo Palazzo
Date: Fri May 12 2023 - 06:46:33 EST

> > > Many years ago we ran that issue to ground and with Robin Murphy's
> > > help we found that while it's possible to gracefully handle that
> > > condition it required hijacking the entire arm64 error handling
> > > routine. Not exactly scalable for just one SoC.
> >
> > Do you have a pointer to that discussion? The URL might save
> > repeating the whole exercise and could be useful for the commit log
> > when we try to resolve this.
> The link to the patch email is here, the full discussion is pretty
> easy to follow:
> Also:

I have some concerns about the patch proposed in the email that you share. It seems like
it is quite extensive (code that is it not just related to the HW) just to fix a hardware
issue. I would have expected the code to fix the bug to be integrated into the driver itself,
so that if the hardware will died at some point in the future, I would expect that also the
buddy code will died with it.

However, it is possible that I may have missed something in the patch,
and my thoughts could be wrong.

> >
> > > The configurable waits allow us to program reasonable times for
> > > 90% of the endpoints that come up in the normal amount of time, while
> > > being able to adjust it for the other 10% that do not. Some require
> > > multiple seconds before they return without error. Part of the reason
> > > we don't want to hardcode the wait time is because the probe isn't
> > > handled asynchronously, so the kernel appears to hang while waiting
> > > for the timeout.
> >
> > Is there some way for users to figure out that they would need this
> > property? Or is it just "if your kernel panics on boot, try
> > adding or increasing "bus-scan-delay-ms" in your DT?
> There's a listing of tested cards at:
> Most cards work fine that don't require a large BAR. PCIe switches are
> completely dead without the above hack patch. Cards that lie in the
> middle are ones that expect BIOS / EFI support to initialize, or ones
> that have complex boot roms and don't initialize quickly.
> But yes, it's unfortunately going to be "if you panic, increase the
> delay" unless a more complete database of cards can be generated.

This is really unfortunate because as mentioned in some previous emails,
using sleep time slows down the kernel. Is there any way to tell the kernel
to tell the kernel "hey we need some more time here", or in computer science
terms, load a driver asynchronously?