Re: [PATCH] pci: irq: Add an early parameter to limit pci irq numbers

From: Huacai Chen
Date: Thu May 25 2023 - 05:14:49 EST


Hi, Bjorn,

On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> [+cc Marc, LKML]
>
> On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > many as logical cpu numbers. So we should limit pci irq numbers when
> > allocate msi/msix vectors, otherwise some device drivers may fail at
> > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > to control the limit.
> >
> > The default pci msi/msix number limit is defined 32 for LoongArch and
> > NR_IRQS for other platforms.
>
> The IRQ experts can chime in on this, but this doesn't feel right to
> me. I assume arch code should set things up so only valid IRQ numbers
> can be allocated. This doesn't seem necessarily PCI-specific, I'd
> prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> command-line parameter that users have to discover and supply.
The problem we meet: LoongArch machines can have as many as 256
logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
machine, 192 irqs can be easily exhausted if there are several NICs
(NIC usually allocates msi irqs depending on the number of online
cpus). So we want to limit the msi allocation.

This is not a LoongArch-specific problem, because I think other
platforms can also meet if they have many NICs. But of course,
LoongArch can meet it more easily because the available msi vectors
are very few. So, adding a cmdline parameter is somewhat reasonable.

After some investigation, I think it may be possible to modify
drivers/irqchip/irq-loongson-pch-msi.c and override
msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
doing that need to remove the "static" before
__msi_domain_alloc_irqs(), which means revert
762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
__msi_domain_alloc_irqs() static"), I don't know whether that is
acceptable.

If such a revert is not acceptable, it seems that we can only use the
method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
a little better.

Huacai

>
> > Signed-off-by: Juxin Gao <gaojuxin@xxxxxxxxxxx>
> > Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx>
> > ---
> > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > 1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > index ef1d8857a51b..6617381e50e7 100644
> > --- a/drivers/pci/msi/msi.c
> > +++ b/drivers/pci/msi/msi.c
> > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > return ret;
> > }
> >
> > +#ifdef CONFIG_LOONGARCH
> > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > +#else
> > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > +#endif
> > +
> > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > +
> > +static int __init pci_irq_limit(char *str)
> > +{
> > + get_option(&str, &pci_irq_limits);
> > +
> > + if (pci_irq_limits == 0)
> > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > +
> > + return 0;
> > +}
> > +
> > +early_param("pci_irq_limit", pci_irq_limit);
> > +
> > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > struct irq_affinity *affd)
> > {
> > int nvec;
> > int rc;
> >
> > + maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > +
> > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > return -EINVAL;
> >
> > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > int maxvec, struct irq_affinity *affd, int flags)
> > {
> > - int hwsize, rc, nvec = maxvec;
> > + int hwsize, rc, nvec;
> > +
> > + nvec = clamp_val(maxvec, 0, pci_irq_limits);
> >
> > if (maxvec < minvec)
> > return -ERANGE;
> > --
> > 2.39.1
> >