Re: [PATCH] pci: irq: Add an early parameter to limit pci irq numbers

From: Huacai Chen
Date: Fri May 26 2023 - 04:18:02 EST


Hi, Bjorn and Marc,

On Fri, May 26, 2023 at 5:40 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote:
> > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote:
> > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as
> > > > many as logical cpu numbers. So we should limit pci irq numbers when
> > > > allocate msi/msix vectors, otherwise some device drivers may fail at
> > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx"
> > > > to control the limit.
> > > >
> > > > The default pci msi/msix number limit is defined 32 for LoongArch and
> > > > NR_IRQS for other platforms.
> > >
> > > The IRQ experts can chime in on this, but this doesn't feel right to
> > > me. I assume arch code should set things up so only valid IRQ numbers
> > > can be allocated. This doesn't seem necessarily PCI-specific, I'd
> > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a
> > > command-line parameter that users have to discover and supply.
> >
> > The problem we meet: LoongArch machines can have as many as 256
> > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core
> > machine, 192 irqs can be easily exhausted if there are several NICs
> > (NIC usually allocates msi irqs depending on the number of online
> > cpus). So we want to limit the msi allocation.
> >
> > This is not a LoongArch-specific problem, because I think other
> > platforms can also meet if they have many NICs. But of course,
> > LoongArch can meet it more easily because the available msi vectors
> > are very few. So, adding a cmdline parameter is somewhat reasonable.
>
> The patch contains "#ifdef CONFIG_LOONGARCH", which makes this
> solution LoongArch-specific. I'm not willing for that yet.
>
> It sounds like the LoongArch MSI limit is known at compile-time, or at
> least at boot-time, so the kernel ought to be able to figure out what
> to do without a command-line parameter.
>
> > After some investigation, I think it may be possible to modify
> > drivers/irqchip/irq-loongson-pch-msi.c and override
> > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However,
> > doing that need to remove the "static" before
> > __msi_domain_alloc_irqs(), which means revert
> > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make
> > __msi_domain_alloc_irqs() static"), I don't know whether that is
> > acceptable.
>
> I guess you mean msi_domain_ops::domain_alloc_irqs() (not
> msi_domain_info). If this is really a generic problem, I'm surprised
> that no other arch has needed to override .domain_alloc_irqs().
Yes, I mean msi_domain_ops::domain_alloc_irqs() here.

>
> I think you'll have better luck getting feedback if you can post the
> complete working patch. At the very least, you'll learn more about
> the problem by doing that.
Emm, I found I can do some small modification on
msi_domain_prepare_irqs(), and solve the problem by overriding
msi_domain_ops::msi_prepare(), thanks. And patches is coming soon.

Huacai
>
> Bjorn