Re: [PATCH] pci: irq: Add an early parameter to limit pci irq numbers

From: Manivannan Sadhasivam
Date: Mon May 29 2023 - 01:39:40 EST


On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote:
> Hi, Manivannan,
>
> On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@xxxxxxxxxx> wrote:
> >
> > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > > Hi, Bjorn,
> > >
> > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > >
> > > > [+cc Marc, LKML]
> > > >
> > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > > to control the limit.
> > > > >
> > > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > > NR_IRQS for other platforms.
> > > >
> > > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > > me. I assume arch code should set things up so only valid IRQ numbers
> > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd
> > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > > command-line parameter that users have to discover and supply.
> > > The problem we meet: LoongArch machines can have as many as 256
> > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > > machine, 192 irqs can be easily exhausted if there are several NICs
> > > (NIC usually allocates msi irqs depending on the number of online
> > > cpus). So we want to limit the msi allocation.
> > >
> >
> > If the MSI allocation fails with multiple vectors, then the NIC driver should
> > revert to a single MSI vector. Is that happening in your case?
> Thank you for pointing this out. Yes, I know most existing drivers
> will fallback to use single msi or legacy irqs when failed. However,
> as I
> replied in another thread (the new solution of this problem [1]), we
> want to do some proactive throttling rather than consume msi vectors
> aggressively. For example, if we have two NICs, we want both of them
> to get 32 msi vectors; not one exhaust all available vectors, and the
> other fallback to use single msi or legacy irq.
>
> I hope I have explained clearly, thanks.
>

The problem you are facing is not specific to Loongsoon but rather generic. And
the solution we have currently is what you were also aware of it seems. So if
you want to propose an alternative solution, it should be generic and also a
good justification needs to be provided to the maintainers i.e., comparing two
solutions and why yours is better.

But IMO what you are proposing seems like usecase driven and may not work all
the time due to architecture limitation. This again proves that the existing
solution is sufficient enough.

- Mani

> [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@xxxxxxxxxxx/T/#t
>
> Huacai
> >
> > - Mani
> >
> > > This is not a LoongArch-specific problem, because I think other
> > > platforms can also meet if they have many NICs. But of course,
> > > LoongArch can meet it more easily because the available msi vectors
> > > are very few. So, adding a cmdline parameter is somewhat reasonable.
> > >
> > > After some investigation, I think it may be possible to modify
> > > drivers/irqchip/irq-loongson-pch-msi.c and override
> > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > > doing that need to remove the "static" before
> > > __msi_domain_alloc_irqs(), which means revert
> > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > > acceptable.
> > >
> > > If such a revert is not acceptable, it seems that we can only use the
> > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is
> > > a little better.
> > >
> > > Huacai
> > >
> > > >
> > > > > Signed-off-by: Juxin Gao <gaojuxin@xxxxxxxxxxx>
> > > > > Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx>
> > > > > ---
> > > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++-
> > > > > 1 file changed, 25 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > > > > index ef1d8857a51b..6617381e50e7 100644
> > > > > --- a/drivers/pci/msi/msi.c
> > > > > +++ b/drivers/pci/msi/msi.c
> > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +#ifdef CONFIG_LOONGARCH
> > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32
> > > > > +#else
> > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS
> > > > > +#endif
> > > > > +
> > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > +
> > > > > +static int __init pci_irq_limit(char *str)
> > > > > +{
> > > > > + get_option(&str, &pci_irq_limits);
> > > > > +
> > > > > + if (pci_irq_limits == 0)
> > > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS;
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +early_param("pci_irq_limit", pci_irq_limit);
> > > > > +
> > > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> > > > > struct irq_affinity *affd)
> > > > > {
> > > > > int nvec;
> > > > > int rc;
> > > > >
> > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > > +
> > > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
> > > > > return -EINVAL;
> > > > >
> > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en
> > > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
> > > > > int maxvec, struct irq_affinity *affd, int flags)
> > > > > {
> > > > > - int hwsize, rc, nvec = maxvec;
> > > > > + int hwsize, rc, nvec;
> > > > > +
> > > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits);
> > > > >
> > > > > if (maxvec < minvec)
> > > > > return -ERANGE;
> > > > > --
> > > > > 2.39.1
> > > > >
> >
> > --
> > மணிவண்ணன் சதாசிவம்

--
மணிவண்ணன் சதாசிவம்