RE: [PATCH RFC] cxl/pci: Skip irq features if irq's are not supported

From: Ira Weiny
Date: Wed Jan 10 2024 - 12:14:22 EST


Dan Williams wrote:
> Ira Weiny wrote:

[snip]

> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index a2fcbca253f3..422bc9657e5c 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -410,6 +410,7 @@ enum cxl_devtype {
> > * @ram_res: Active Volatile memory capacity configuration
> > * @serial: PCIe Device Serial Number
> > * @type: Generic Memory Class device or Vendor Specific Memory device
> > + * @irq_supported: Flag if irqs are supported by the device
> > */
> > struct cxl_dev_state {
> > struct device *dev;
> > @@ -424,6 +425,7 @@ struct cxl_dev_state {
> > struct resource ram_res;
> > u64 serial;
> > enum cxl_devtype type;
> > + bool irq_supported;
>
> I would rather not carry this init-time-only relevant flag in perpetuity
> in the state structure.

Fair enough.

> Let cxl_pci_probe() see the result from
> cxl_alloc_irq_vectors() and then optionally skip calling setup for
> features the demand interrupt support.

yea better the bool is a local variable to cxl_pci_probe().

>
> > };
> >
> > /**
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 0155fb66b580..bb90ac011290 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -443,6 +443,12 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds)
> > if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ))
> > return 0;
> >
> > + if (!cxlds->irq_supported) {
> > + dev_err(cxlds->dev, "Mailbox interrupts enabled but device indicates no interrupt vectors supported.\n");
> > + dev_err(cxlds->dev, "Skip mailbox iterrupt configuration.\n");
> > + return 0;
> > + }
>
> I see no need to do a emit a log message here as the code is happy to
> support a mailbox in polled mode.

True. However this indicates an error with the device IMO. The device
did not support MSI/MSI-X but yet indicates irq support for mailboxes.
That is not a well behaved device even it it will work. We are not
failing the probe here but I think the error gives users good insight.

We could just make it dev_dbg() though.

> I.e. this is not an error that the
> user should call their device-vendor about because end user will see no
> loss of functionality.

But it is not exactly a nice device IMO.

>
> The code right after this is already fully tolerant of IRQ setup errors:

Agreed which is why only the error was printed and the irq setup calls
skipped for good measure.

If you feel strongly about it I can just drop the hunk but I still think
it is worth some message for those devices behaving this way.

>
> irq = pci_irq_vector(to_pci_dev(cxlds->dev), msgnum);
> if (irq < 0)
> return 0;
>
> if (cxl_request_irq(cxlds, irq, cxl_pci_mbox_irq))
> return 0;
>
>

[snip]

> >
> > static irqreturn_t cxl_event_thread(int irq, void *id)
> > @@ -754,6 +762,13 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
> > if (!host_bridge->native_cxl_error)
> > return 0;
> >
> > + /* Polling not supported */
> > + if (!mds->cxlds.irq_supported) {
> > + dev_err(mds->cxlds.dev, "Host events enabled but device indicates no interrupt vectors supported.\n");
> > + dev_err(mds->cxlds.dev, "Event polling is not supported, skip event processing.\n");
> > + return 0;
> > + }
>
> This one can be a dev_info(), since there is no polling fallback and it
> is unlikely that a device supports events without supporting interrupts.

Sounds good.

>
> ...or maybe unify all these notifications in the result from
> cxl_alloc_irq_vectors():
>
> rc = cxl_alloc_irq_vectors();
> if (rc) {
> dev_dbg(dev, "No interrupt support, interrupt-dependent features disabled.\n");
> interrupts_supported = false;
> }
>
> Where dev_dbg() instead of dev_info() because the people that are
> missing features will report this debug log and upstream can say...
> "yup, there's your problem". Where users with cards that are known to
> not support interrupts do not otherwise spam the logs with info they
> know already.
>
> I also note that cxl_request_irq() will do the right thing, so likely
> don't even need that interrupts_supported flag.

Perhaps, but devices which don't support interrupts by design (and don't
attempt to have any irq features) should be silent IMO. Why spam the log
with that information even if only during a debug session.

For example if a user has 2 devices, 1 broken from vendor X and 1 which
just does not do irqs from vendor Y, the above would be printed for both
devices when they are trying to debug the broken device. Then they have
to rely on both vendors to report back.

In the case of reporting an actual error they can call vendor X and leave
vendor Y alone.

I know it is more code and you wanted the smallest possible change but I
think this is worth some code.

I'll rework this a bit and send a V1 for real review.

Ira