Re: [PATCH V1] PCI: dwc: Use dev_info for PCIe link down event logging
From: Rob Herring
Date: Wed Oct 26 2022 - 14:06:44 EST
On Mon, Oct 10, 2022 at 1:02 AM Vidya Sagar <vidyas@xxxxxxxxxx> wrote:
>
>
>
> On 10/4/2022 6:23 PM, Rob Herring wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Sep 15, 2022 at 9:52 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> >>
> >> On Thu, Sep 15, 2022 at 09:16:27AM -0500, Rob Herring wrote:
> >>> On Wed, Sep 14, 2022 at 1:24 AM Manivannan Sadhasivam
> >>> <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Tue, Sep 13, 2022 at 03:07:46PM -0500, Bjorn Helgaas wrote:
> >>>>> On Tue, Sep 13, 2022 at 06:00:30PM +0100, Jon Hunter wrote:
> >>>>>> On 13/09/2022 17:51, Manivannan Sadhasivam wrote:
> >>>>>>> On Tue, Sep 13, 2022 at 03:42:37PM +0530, Vidya Sagar wrote:
> >>>>>>>> Some of the platforms (like Tegra194 and Tegra234) have open slots and
> >>>>>>>> not having an endpoint connected to the slot is not an error.
> >>>>>>>> So, changing the macro from dev_err to dev_info to log the event.
> >>>>>>>
> >>>>>>> But the link up not happening is an actual error and -ETIMEDOUT is being
> >>>>>>> returned. So I don't think the log severity should be changed.
> >>>>>>
> >>>>>> Yes it is an error in the sense it is a timeout, but reporting an error
> >>>>>> because nothing is attached to a PCI slot seems a bit noisy. Please note
> >>>>>> that a similar change was made by the following commit and it also seems
> >>>>>> appropriate here ...
> >>>>>>
> >>>>>> commit 4b16a8227907118e011fb396022da671a52b2272
> >>>>>> Author: Manikanta Maddireddy <mmaddireddy@xxxxxxxxxx>
> >>>>>> Date: Tue Jun 18 23:32:06 2019 +0530
> >>>>>>
> >>>>>> PCI: tegra: Change link retry log level to debug
> >>>>>>
> >>>>>>
> >>>>>> BTW, we check for error messages in the dmesg output and this is a new error
> >>>>>> seen as of Linux v6.0 and so this was flagged in a test. We can ignore the
> >>>>>> error, but in this case it seem more appropriate to make this a info or
> >>>>>> debug level print.
> >>>>>
> >>>>> Can you tell whether there's a device present, e.g., via Slot Status
> >>>>> Presence Detect? If there's nothing in the slot, I don't know why we
> >>>>> would print anything at all. If a card is present but there's no
> >>>>> link, that's probably worthy of dev_info() or even dev_err().
> >>>>>
> >>>>
> >>>> I don't think all form factors allow for the PRSNT pin to be wired up,
> >>>> so we cannot know if the device is actually present in the slot or not all
> >>>> the time. Maybe we should do if the form factor supports it?
> >>>>
> >>>>> I guess if you can tell the slot is empty, there's no point in even
> >>>>> trying to start the link, so you could avoid both the message and the
> >>>>> timeout by not even calling dw_pcie_wait_for_link().
> >>>>
> >>>> Right. There is an overhead of waiting for ~1ms during boot.
> >>>
> >>> Async probe should mitigate that, right? Saravana is working toward
> >>> making that the default instead of opt in, but you could opt in now.
> >>>
> >>
> >> No. The delay is due to the DWC core waiting for link up that depends on
> >> the PCIe device to be present on the slot.
> >
> > Yes, I understand that already.
> >
> >> The driver probe order
> >> doesn't apply here.
> >
> > I'm not talking about probe order, but rather async probe enabling
> > parallel probing. If waiting for the link happens asynchronously, then
> > other probes can happen in parallel and you won't see the delay (until
> > you run out of cores or all the other probes are faster).
>
> Are you suggesting to break the existing probe of DWC based PCIe
> platform drivers into two i.e. sync part that handles the sequence up
> until link up and the async part that starts after link is up (or after
> LIKUP_TIMEOUT if link doesn't come up).
No, just make the driver opt-in to async probe. It's 1 flag to set for
the driver. Then the delay in your probe is not blocking other probes
and the whole probe for the driver happens in parallel. Then the delay
is only an issue if it is longer than all the other things
initializing during boot or if you are on a single core system.
Neither is likely true.
Rob