Re: [PATCH V1] PCI: designware-ep: Fix DBI access before core init

From: Bjorn Helgaas
Date: Fri Jul 29 2022 - 18:44:12 EST


[+cc Xiaowei (author of 6bfc9c3a2c70), Hou (author of 8bcca2658558)]

On Thu, Jul 28, 2022 at 05:56:28PM +0530, Vidya Sagar wrote:
> On 7/28/2022 3:44 AM, Bjorn Helgaas wrote:
> > On Wed, Jun 22, 2022 at 09:31:33AM +0530, Vidya Sagar wrote:
> > > Platforms that cannot support their core initialization without the
> > > reference clock from the host, implement the feature 'core_init_notifier'
> > > to indicate the DesignWare sub-system about when their core is getting
> > > initialized. Any accesses to the core (Ex:- DBI) would result in system
> > > hang in such systems (Ex:- tegra194). This patch moves any access to the
> > > core to dw_pcie_ep_init_complete() API which is effectively called only
> > > after the core initialization.
> >
> > I assume this is still broken. I want to fix it. I assume this patch
> > fixes it and there are no known problems with it. I assume this can
> > be fixed so it works on all platforms, whether they use
> > core_init_notifier or not.
>
> Yes. All your assumptions are correct.
>
> > I'd like the commit log to be specific about where the hang occurs so
> > it's easy for a non-DesignWare expert (me!) to see the problem. E.g.,
> > on tegra194, X depends on Y, but Y is initialized after X. Say
> > specifically what functions X and Y are.
>
> X = DBI accesses
> Y = Core initialization which in turn depends on the REFCLK from the host
>
> Without this patch, hang happens when DBI registers are accessed without
> core being initialized. In the case of Tegra194 at least, core gets
> initialized only after REFCLK is available from the host. The way we make
> sure that the REFCLK is available from the host is by checking for PERST#
> de-assertion interrupt. (PCIe spec mandates that the host must supply REFCLK
> before de-asserting PERST# signal).
> This patch prevents any accesses to the DBI/Core registers if the platform
> says that it supports core_init_notifier.

There are several commits involved in this, oldest to newest:

e966f7390da9 ("PCI: dwc: Refactor core initialization code for EP mode")
Split EP initialization into dw_pcie_ep_init() (which doesn't
touch core registers) and dw_pcie_ep_init_complete() (which does).

dw_pcie_ep_init
of_property_read_u32(np, "num-ib-windows", &ep->num_ib_windows)
ep->ib_window_map = devm_kcalloc(..., ep->num_ib_windows, ...)
dw_pcie_ep_init_complete # core regs are available
dw_pcie_readb_dbi # <-- read core regs

6bfc9c3a2c70 ("PCI: designware-ep: Move the function of getting MSI capability forward")
Move MSI capability lookup from dw_pcie_ep_init_complete() to
dw_pcie_ep_init().

dw_pcie_ep_init
dw_pcie_find_capability(pci, PCI_CAP_ID_MSI)
dw_pcie_readw_dbi # <-- read core regs
dw_pcie_ep_init_complete # core regs are available

This is broken because we access the DBI registers before the core
is initialized.

281f1f99cf3a ("PCI: dwc: Detect number of iATU windows")
Read num_ib_windows from iATU registers instead of from DT
property.

dw_pcie_ep_init
ib_window_map = devm_kcalloc(..., pci->num_ib_windows, ...) # use
dw_pcie_ep_init_complete
dw_pcie_setup
dw_pcie_iatu_detect_regions
dw_pcie_readl_dbi
pci->num_ib_windows = ib # init

This ordering is broken because we use pci->num_ib_windows before
we initialize it.

8bcca2658558 ("PCI: dwc: Move iATU detection earlier")
Fix the use-before-init problem by moving init earlier:

dw_pcie_ep_init
dw_pcie_iatu_detect
dw_pcie_readl_dbi # <-- read core regs
pci->num_ib_windows = ib # init
ib_window_map = devm_kcalloc(..., pci->num_ib_windows, ...) # use
dw_pcie_ep_init_complete # core regs are available
dw_pcie_setup

This fixed the use-before-init, but it causes the hang by
accessing the DBI registers before the core is initialized.

This patch addresses the problems of 6bfc9c3a2c70 and 8bcca2658558 by
moving dw_pcie_iatu_detect() and dw_pcie_ep_find_capability() from
dw_pcie_ep_init() to dw_pcie_ep_init_complete().

We probably need Fixes: tags for both.

But I don't think this patch is enough, and I have a lot of questions:

1) The "core_init_notifier" mechanism is kind of obtuse. IIUC, any
driver that sets ".core_init_notifier = true" is responsible for
calling dw_pcie_ep_init_complete() at some point. But that's not
explicit at all in the code. There's no "notifier" involved. I
suppose the idea is that the *interrupt* is the notifier, but the
naming doesn't help connect things.

2) I still want to know exactly where (what function) the critical
transition that enables DBI access is. I guess it's
qcom_pcie_perst_deassert() for qcom and
pex_ep_event_pex_rst_deassert() for tegra? And for drivers that
don't set .core_init_notifier, DBI is accessible any time in
dw_pcie_ep_init() or later?

3) How do these interrupt handlers work? I'm used to the model of:

Install interrupt handler
Do something to the device so it will generate an interrupt
Later, the device generates the interrupt
Execute the interrupt handler

We register tegra_pcie_ep_pex_rst_irq() as an interrupt handler,
but what touches the device and starts the process that causes
the interrupt?

4) 6bfc9c3a2c70 moved the MSI capability lookup to dw_pcie_ep_init()
because .ep_init() functions, e.g., ls_pcie_ep_init(), depend on
ep_func->msi_cap. Moving the MSI lookup to
dw_pcie_ep_init_complete() will break that again, won't it?

5) dw_pcie_ep_init() calls ep->ops->ep_init(ep), and almost all of
those .ep_init() methods, e.g., dra7xx_pcie_ep_init(),
ls_pcie_ep_init(), etc., call dw_pcie_ep_reset_bar(), which does
DBI writes. These are before dw_pcie_ep_init_complete(), so they
are broken per the design (though they probably *work* because
the drivers don't set .core_init_notifier, so DBI accesses
probably work earlier).

The only .ep_init() in a driver that sets .core_init_notifier is
qcom_pcie_ep_init(). That *looks* like it should actually be
broken because it runs before dw_pcie_ep_init_complete().

6) What's going on with the CORE_INIT and LINK_UP notifiers?
dw_pcie_ep_init_notify() is only called by qcom and tegra.
dw_pcie_ep_linkup() is only called by dra7xx, qcom, and tegra.
As far as I can tell, nobody at all registers to handle those
events except a test. I think it's pointless to have that code
if nobody uses it.

> > > ---
> > > .../pci/controller/dwc/pcie-designware-ep.c | 88 +++++++++++--------
> > > 1 file changed, 49 insertions(+), 39 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > index 0eda8236c125..9feec720175f 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > @@ -639,9 +639,14 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap)
> > > int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> > > {
> > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > + struct dw_pcie_ep_func *ep_func;
> > > + struct device *dev = pci->dev;
> > > + struct pci_epc *epc = ep->epc;
> > > unsigned int offset;
> > > unsigned int nbars;
> > > u8 hdr_type;
> > > + u8 func_no;
> > > + void *addr;
> > > u32 reg;
> > > int i;
> > >
> > > @@ -654,6 +659,42 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep)
> > > return -EIO;
> > > }
> > >
> > > + dw_pcie_iatu_detect(pci);
> > > +
> > > + ep->ib_window_map = devm_kcalloc(dev,
> > > + BITS_TO_LONGS(pci->num_ib_windows),
> > > + sizeof(long),
> > > + GFP_KERNEL);
> > > + if (!ep->ib_window_map)
> > > + return -ENOMEM;
> > > +
> > > + ep->ob_window_map = devm_kcalloc(dev,
> > > + BITS_TO_LONGS(pci->num_ob_windows),
> > > + sizeof(long),
> > > + GFP_KERNEL);
> > > + if (!ep->ob_window_map)
> > > + return -ENOMEM;
> > > +
> > > + addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> > > + GFP_KERNEL);
> > > + if (!addr)
> > > + return -ENOMEM;
> > > + ep->outbound_addr = addr;
> > > +
> > > + for (func_no = 0; func_no < epc->max_functions; func_no++) {
> > > + ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> > > + if (!ep_func)
> > > + return -ENOMEM;
> > > +
> > > + ep_func->func_no = func_no;
> > > + ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> > > + PCI_CAP_ID_MSI);
> > > + ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> > > + PCI_CAP_ID_MSIX);
> > > +
> > > + list_add_tail(&ep_func->list, &ep->func_list);
> > > + }
> > > +
> > > offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
> > >
> > > dw_pcie_dbi_ro_wr_en(pci);
> > > @@ -677,8 +718,6 @@ EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete);
> > > int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > {
> > > int ret;
> > > - void *addr;
> > > - u8 func_no;
> > > struct resource *res;
> > > struct pci_epc *epc;
> > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > @@ -686,7 +725,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > struct platform_device *pdev = to_platform_device(dev);
> > > struct device_node *np = dev->of_node;
> > > const struct pci_epc_features *epc_features;
> > > - struct dw_pcie_ep_func *ep_func;
> > >
> > > INIT_LIST_HEAD(&ep->func_list);
> > >
> > > @@ -708,8 +746,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > }
> > > }
> > >
> > > - dw_pcie_iatu_detect(pci);
> > > -
> > > res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
> > > if (!res)
> > > return -EINVAL;
> > > @@ -717,26 +753,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > ep->phys_base = res->start;
> > > ep->addr_size = resource_size(res);
> > >
> > > - ep->ib_window_map = devm_kcalloc(dev,
> > > - BITS_TO_LONGS(pci->num_ib_windows),
> > > - sizeof(long),
> > > - GFP_KERNEL);
> > > - if (!ep->ib_window_map)
> > > - return -ENOMEM;
> > > -
> > > - ep->ob_window_map = devm_kcalloc(dev,
> > > - BITS_TO_LONGS(pci->num_ob_windows),
> > > - sizeof(long),
> > > - GFP_KERNEL);
> > > - if (!ep->ob_window_map)
> > > - return -ENOMEM;
> > > -
> > > - addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t),
> > > - GFP_KERNEL);
> > > - if (!addr)
> > > - return -ENOMEM;
> > > - ep->outbound_addr = addr;
> > > -
> > > if (pci->link_gen < 1)
> > > pci->link_gen = of_pci_get_max_link_speed(np);
> > >
> > > @@ -753,20 +769,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > if (ret < 0)
> > > epc->max_functions = 1;
> > >
> > > - for (func_no = 0; func_no < epc->max_functions; func_no++) {
> > > - ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> > > - if (!ep_func)
> > > - return -ENOMEM;
> > > -
> > > - ep_func->func_no = func_no;
> > > - ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> > > - PCI_CAP_ID_MSI);
> > > - ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> > > - PCI_CAP_ID_MSIX);
> > > -
> > > - list_add_tail(&ep_func->list, &ep->func_list);
> > > - }
> > > -
> > > if (ep->ops->ep_init)
> > > ep->ops->ep_init(ep);
> > >
> > > @@ -790,6 +792,14 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > > return 0;
> > > }
> > >
> > > + /*
> > > + * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this
> > > + * step as platforms that implement 'core_init_notifier' feature may
> > > + * not have the hardware ready (i.e. core initialized) for access
> > > + * (Ex: tegra194). Any hardware access on such platforms result
> > > + * in system hard hang.
> > > + */
> > > +
> > > return dw_pcie_ep_init_complete(ep);
> > > }
> > > EXPORT_SYMBOL_GPL(dw_pcie_ep_init);
> > > --
> > > 2.17.1
> > >