Re: [PATCH] pci: irq: Add an early parameter to limit pci irq numbers

From: Bjorn Helgaas
Date: Thu May 25 2023 - 17:41:00 EST

On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > to control the limit.
> > >
> > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > NR_IRQS for other platforms.
> >
> > The IRQ experts can chime in on this, but this doesn't feel right to
> > me. I assume arch code should set things up so only valid IRQ numbers
> > can be allocated. This doesn't seem necessarily PCI-specific, I'd
> > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > command-line parameter that users have to discover and supply.
> The problem we meet: LoongArch machines can have as many as 256
> logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> machine, 192 irqs can be easily exhausted if there are several NICs
> (NIC usually allocates msi irqs depending on the number of online
> cpus). So we want to limit the msi allocation.
> This is not a LoongArch-specific problem, because I think other
> platforms can also meet if they have many NICs. But of course,
> LoongArch can meet it more easily because the available msi vectors
> are very few. So, adding a cmdline parameter is somewhat reasonable.

The patch contains "#ifdef CONFIG_LOONGARCH", which makes this
solution LoongArch-specific. I'm not willing for that yet.

It sounds like the LoongArch MSI limit is known at compile-time, or at
least at boot-time, so the kernel ought to be able to figure out what
to do without a command-line parameter.

> After some investigation, I think it may be possible to modify
> drivers/irqchip/irq-loongson-pch-msi.c and override
> msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> doing that need to remove the "static" before
> __msi_domain_alloc_irqs(), which means revert
> 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> __msi_domain_alloc_irqs() static"), I don't know whether that is
> acceptable.

I guess you mean msi_domain_ops::domain_alloc_irqs() (not
msi_domain_info). If this is really a generic problem, I'm surprised
that no other arch has needed to override .domain_alloc_irqs().

I think you'll have better luck getting feedback if you can post the
complete working patch. At the very least, you'll learn more about
the problem by doing that.