Re: [PATCH v15 4/5] PCI/DPC: Add Error Disconnect Recover (EDR) support

From: Bjorn Helgaas
Date: Wed Feb 26 2020 - 16:32:38 EST


On Wed, Feb 26, 2020 at 10:42:27AM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 2/25/20 5:02 PM, Bjorn Helgaas wrote:
> > On Thu, Feb 13, 2020 at 10:20:16AM -0800, sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> > > ...

> > > +static void edr_handle_event(acpi_handle handle, u32 event, void *data)
> > > +{
> > > + struct dpc_dev *dpc = data, ndpc;

> > There's actually very little use of struct dpc_dev in this file. I
> > bet with a little elbow grease, we could remove it completely and just
> > use the pci_dev * or maybe an opaque pointer.

> Yes, we could remove it. But it might need some more changes to
> dpc driver functions. I can think of two ways,
>
> 1. Re-factor the DPC driver not to use dpc_dev structure and just use
> pci_dev in their functions implementation. But this might lead to
> re-reading following dpc_dev structure members every time we
> use it in dpc driver functions.
>
> (Currently in dpc driver probe they cache the following device parameters )
>
>   9         u16                     cap_pos;
>  10         bool                    rp_extensions;
>  11         u8                      rp_log_size;
>  12         u16                     ctl;
>  13         u16                     cap;

I think this is basically what I proposed with the sample patch in my
response to your 3/5 patch. But I don't see the ctl/cap part, so
maybe I missed something.

> 2. We can create a new variant of dpc_process_err() which depends on pci_dev
> structure and move the dpc_dev initialization to it. Downside is, we should
> do this
> initialization every time we get DPC event (which should be rare).
>
> void dpc_process_error(struct pci_dev *pdev)
> {
>     struct dpc_dev dpc;
>     dpc_dev_init(pdev, &dpc);
>
>    ....
>
> }
>
> Let me know your comments.
>
> >
> > > + struct pci_dev *pdev = dpc->pdev;
> > > + pci_ers_result_t estate = PCI_ERS_RESULT_DISCONNECT;
> > > + u16 status;
> > > +
> > > + pci_info(pdev, "ACPI event %#x received\n", event);
> > > +
> > > + if (event != ACPI_NOTIFY_DISCONNECT_RECOVER)
> > > + return;
> > > +
> > > + /*
> > > + * Check if _DSM(0xD) is available, and if present locate the
> > > + * port which issued EDR event.
> > > + */
> > > + pdev = acpi_locate_dpc_port(pdev);

> > This function name should include "get" since it's part of the
> > pci_dev_get()/pci_dev_put() sequence.

> How about acpi_dpc_port_get(pdev) ?

OK.

> > > + if (!pdev) {
> > > + pci_err(dpc->pdev, "No valid port found\n");

This message should be expanded somehow. I think the point is that we
got an EDR notification, but firmware couldn't tell us where the
containment event occurred. Should that ever happen? Or is it a
firmware defect if it *does* happen?

In any event, I think the message should say something like "Can't
identify source of EDR notification".

> > > + return;
> > > + }
> > > +
> > > + if (pdev != dpc->pdev) {
> > > + pci_warn(pdev, "Initializing dpc again\n");
> > > + dpc_dev_init(pdev, &ndpc);

> > This seems... I'm not sure what. I guess it's really just reading
> > the DPC capability for use by dpc_process_error(), so maybe it's OK.
> > But it's a little strange to read.

I *think* maybe if we move the DPC info into the struct pci_dev it
will solve this issue too? I.e., we won't have a struct dpc_dev, so
we won't have this funny-looking dpc_dev_init().

> > Is this something we should be warning about?

> No this is a valid case. it will only happen if we have a non-acpi
> based switch attached to root port.

I agree this is a valid case (as I mentioned below). My point was
just that if it is a valid case, we might not want to use pci_warn()
here. Maybe pci_info() if you think it's important, or maybe no
message at all. I don't think "Initializing dpc again" is going to be
useful to a user.

> > I think the ECR says
> > it's legitimate to return a child device, doesn't it?

> > > + * TODO: Remove dependency on ACPI FIRMWARE_FIRST bit to
> > > + * determine ownership of DPC between firmware or OS.

> > Extend the comment to say how we *should* determine ownership.

> Yes, ownership should be based on _OSC negotiation. I will add necessary
> comments here.

Why are we not doing this via _OSC negotiation in this series? It
would be much better if we could just do it instead of adding a
comment that we *should* do it. Nobody knows more about this than you
do, so probably nobody else is going to come along and finish this
up :)

> > > + dpc = devm_kzalloc(&pdev->dev, sizeof(*dpc), GFP_KERNEL);

> > This kzalloc should be in dpc.c, not here.
> >
> > And I don't see a corresponding free.

> It will be freed when removing the pdev right ? Do you want to free it
> explicitly in this driver ?

Nope, you're right. I always forget about the devm magic, sorry.