Re: [PATCH v7 3/5] Add debugfs based silicon debug support in DWC

From: Bjorn Helgaas
Date: Wed Mar 05 2025 - 12:38:37 EST


On Tue, Mar 04, 2025 at 10:41:54PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Mar 05, 2025 at 12:46:38AM +0900, Krzysztof Wilczyński wrote:
> > > On Mon, 3 Mar 2025 at 20:47, Krzysztof Wilczyński <kw@xxxxxxxxx> wrote:
> > > > [...]
> > > > > > +int dwc_pcie_debugfs_init(struct dw_pcie *pci)
> > > > > > +{
> > > > > > + char dirname[DWC_DEBUGFS_BUF_MAX];
> > > > > > + struct device *dev = pci->dev;
> > > > > > + struct debugfs_info *debugfs;
> > > > > > + struct dentry *dir;
> > > > > > + int ret;
> > > > > > +
> > > > > > + /* Create main directory for each platform driver */
> > > > > > + snprintf(dirname, DWC_DEBUGFS_BUF_MAX, "dwc_pcie_%s", dev_name(dev));
> > > > > > + dir = debugfs_create_dir(dirname, NULL);
> > > > > > + debugfs = devm_kzalloc(dev, sizeof(*debugfs), GFP_KERNEL);
> > > > > > + if (!debugfs)
> > > > > > + return -ENOMEM;
> > > > > > +
> > > > > > + debugfs->debug_dir = dir;
> > > > > > + pci->debugfs = debugfs;
> > > > > > + ret = dwc_pcie_rasdes_debugfs_init(pci, dir);
> > > > > > + if (ret)
> > > > > > + dev_dbg(dev, "RASDES debugfs init failed\n");
> > > > >
> > > > > What will happen if ret != 0? still return 0?
> > >
> > > And that is exactly what happens on Gray Hawk Single with R-Car
> > > V4M: dw_pcie_find_rasdes_capability() returns NULL, causing
> > > dwc_pcie_rasdes_debugfs_init() to return -ENODEV.
> > >
> > > Debugfs issues should never be propagated upstream!
> ...

> > > So while applying, you changed this like:
> > >
> > > ret = dwc_pcie_rasdes_debugfs_init(pci, dir);
> > > - if (ret)
> > > - dev_dbg(dev, "RASDES debugfs init failed\n");
> > > + if (ret) {
> > > + dev_err(dev, "failed to initialize RAS DES debugfs\n");
> > > + return ret;
> > > + }
> > >
> > > return 0;
> > >
> > > Hence this is now a fatal error, causing the probe to fail.

> Even though debugfs_init() failure is not supposed to fail the probe(),
> dwc_pcie_rasdes_debugfs_init() has a devm_kzalloc() and propagating that
> failure would be canolically correct IMO.

I'm not sure about this. What's the requirement to propagate
devm_kzalloc() failures? I think devres will free any allocs that
were successful regardless.

IIUC, we resolved the Gray Hawk Single issue by changing
dwc_pcie_rasdes_debugfs_init() to return success without doing
anything when there's no RAS DES Capability.

But dwc_pcie_debugfs_init() can still return failure, and that still
causes dw_pcie_ep_init_registers() to fail, which breaks the "don't
propagate debugfs issues upstream" rule:

int dw_pcie_ep_init_registers(struct dw_pcie_ep *ep)
{
...
ret = dwc_pcie_debugfs_init(pci);
if (ret)
goto err_remove_edma;

return 0;

err_remove_edma:
dw_pcie_edma_remove(pci);

return ret;
}

We can say that kzalloc() failure should "never" happen, and therefore
it's OK to fail the driver probe if it happens, but that doesn't seem
like a strong argument for breaking the "don't propagate debugfs
issues" rule. And someday there may be other kinds of failures from
dwc_pcie_debugfs_init().

Bjorn