Re: [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW controller

From: 'Manivannan Sadhasivam'
Date: Tue Mar 19 2024 - 12:33:40 EST


On Thu, Feb 22, 2024 at 04:30:47PM +0530, Shradha Todi wrote:
> + Borislav, Tony, James, Mauro, Robert
>
> Hi All,
>
> Synopsys DesignWare PCIe controllers have a vendor specific capability (which
> means that this set of registers are only present in DesignWare controllers)
> to perform debug operations called "RASDES".
> The functionalities provided by this extended capability are:
>
> 1. Debug: This has some debug related diagnostic features like holding LTSSM
> in certain states, reading the status of lane detection, checking if any PCIe
> lanes are broken (RX Valid) and so on. It's a debug only feature used for diagnostic
> use-cases.
>
> 2. Error Injection: This is a way to inject certain errors in PCIe like LCRC, ECRC,
> Bad TLPs and so on. Again, this is a debug feature and generally not used in
> functional use-case.
>
> 3. Statistical counters: This has 3 parts
> - Error counters
> - Non error counters (covered as part of perf [1])
> - Time based analysis counters (covered as part of perf [1])
>
> Selective features of the above functionality has been implemented
> by vendor specific PCIe controller drivers (pcie-tegra194.c) that use
> Synopsys DesignWare PCIe controllers.
> In order to make it useful to all vendors using DWC controller, we had
> proposed a common implementation in DWC PCIe controller directory
> (drivers/pci/controller/dwc/) and our original idea was based on debugfs
> filesystem. v1 and v2 are mentioned in [2] and [3].
>
> We got a suggestion to implement this as part of EDAC framework [3] and
> we looked into the same. But as far as I understood, what I am trying to
> implement is a very specific feature (only valid for Synopsys DWC PCIe controllers).
> This doesn't seem to fit in very well with the EDAC framework and we can
> hardly use any of the EDAC framework APIs. We tried implementing a
> "pci_driver" but since a function driver will already be running on the EP and
> portdrv on the root-complex, we will not be able to bind 2 drivers to a single
> PCI device (root-complex or endpoint). Ultimately, what I will be doing is
> writing a platform driver with debugfs entries which will be present in EDAC
> directory instead of DWC directory.
>
> Can you please help us out by going through this thread [3] and letting us
> know if our understanding is wrong at any point. If you think it is a better
> idea to integrate this in the EDAC framework, can you guide me as
> to how I can utilize the framework better?
> Please let me know if you need any other information to conclude.
>
> [1] https://lore.kernel.org/linux-pci/20231121013400.18367-1-xueshuai@xxxxxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/all/20210518174618.42089-1-shradha.t@xxxxxxxxxxx/T/
> [3] https://lore.kernel.org/all/20231130115044.53512-1-shradha.t@xxxxxxxxxxx/
>

Gentle ping for the EDAC maintainers.

- Mani

> Thanks,
> Shradha
>
> > -----Original Message-----
> > From: 'Manivannan Sadhasivam' <manivannan.sadhasivam@xxxxxxxxxx>
> > Sent: 16 February 2024 19:19
> > To: Shradha Todi <shradha.t@xxxxxxxxxxx>
> > Cc: lpieralisi@xxxxxxxxxx; kw@xxxxxxxxx; robh@xxxxxxxxxx;
> > bhelgaas@xxxxxxxxxx; jingoohan1@xxxxxxxxx;
> > gustavo.pimentel@xxxxxxxxxxxx; josh@xxxxxxxxxxxxxxxx;
> > lukas.bulwahn@xxxxxxxxx; hongxing.zhu@xxxxxxx;
> > pankaj.dubey@xxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-
> > pci@xxxxxxxxxxxxxxx; vidyas@xxxxxxxxxx; gost.dev@xxxxxxxxxxx
> > Subject: Re: [PATCH v2 0/3] Add support for RAS DES feature in PCIe DW
> > controller
> >
> > On Thu, Feb 15, 2024 at 02:55:06PM +0530, Shradha Todi wrote:
> > >
> > >
> >
> > [...]
> >
> > > > For the error injection and counters, we already have the EDAC
> > > > framework. So adding them in the DWC driver doesn't make sense to me.
> > > >
> > >
> > > Sorry for late response, was going through the EDAC framework to understand
> > better how we can fit RAS DES support in it. Below are some technical challenges
> > found so far:
> > > 1: This debugfs framework proposed [1] can run on both side of the link i.e. RC
> > and EP as it will be a part of the link controller platform driver. Here for the EP
> > side the assumption is that it has Linux running, which is primarily a use case for
> > chip-to-chip communication. After your suggestion to migrate to EDAC
> > framework we studied and here are the findings:
> > > - If we move to EDAC framework, we need to have RAS DES as a
> > > pci_driver which will be binded based on vendor_id and device_id. Our
> > > observation is that on EP side system we are unable to bind two
> > > function driver (pci_driver), as pci_endpoint_test function driver or
> > > some other chip-to-chip function driver will already be bound. On the
> > > other hand, on RC side we observed that if we have portdrv enabled in
> > > Linux running on RC system, it gets bound to RC controller and then it
> > > does not allow EDAC pci_driver to bind. So basically we see a problem
> > > here, that we can't have two pci_driver binding to same PCI device
> > > 2: Another point is even though we use EDAC driver framework, we may not be
> > able to use any of EDAC framework APIs as they are mostly suitable for memory
> > controller devices sitting on PCI BUS. We will end up using debugfs entries just via
> > a pci_driver placed inside EDAC framework.
> >
> > Please wrap your replies to 80 characters.
> >
> > There is no need to bind the edac driver to VID:PID of the device. The edac driver
> > can be a platform driver and you can instantiate the platform device from the
> > DWC driver. This way, the PCI device can be assocaited with whatever driver, but
> > still there can be a separate edac driver for handling errors.
> >
> > Regarding API limitation, you should ask the maintainer about the possibility of
> > extending them.
> >
> > >
> > > Please let me know if my understanding is wrong.
> > >
> > > > But first check with the perf driver author if they have any plans
> > > > on adding the proposed functionality. If they do not have any plan
> > > > or not working on it, then look into EDAC.
> > > >
> > > > - Mani
> > > >
> > >
> > > Since we already worked and posted patches [1], [2], we will continue to work
> > on this and based on consent from community we will adopt to most suitable
> > framework.
> > > We see many subsystems like ethernet, usb, gpu, cxl having debugfs files that
> > give information about the current status of the running system and as of now
> > based on our findings, we still feel there is no harm in having debugfs entry based
> > support in DesignWare controller driver itself.
> >
> > There is no issue in exposing the debug information through debugfs, that's the
> > sole purpose of the interface. But here, you are trying to add support for DWC
> > RAS feature for which a dedicated framework already exists.
> >
> > And there will be more similar requests coming for vendor specific error protocols
> > as well. So your investigation could benefit everyone.
> >
> > From your above investigation, looks like there are some shortcomings of the
> > EDAC framework. So let's get that clarified by writing to the EDAC maintainers
> > (keep us in CC). If the EDAC maintainer suggests you to add support for this
> > feature in DWC driver itself citing some reasons, then no issues with me.
> >
> > - Mani
> >
> > --
> > மணிவண்ணன் சதாசிவம்
>
>

--
மணிவண்ணன் சதாசிவம்