Re: [PATCH 1/2] genirq/msi, platform-msi: Adjust return value of msi_domain_prepare_irqs()

From: Thomas Gleixner
Date: Mon May 29 2023 - 16:20:22 EST


Huacai!

On Mon, May 29 2023 at 17:36, Huacai Chen wrote:
> On Mon, May 29, 2023 at 5:27 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> By default you allow up to 256 interrupts to be allocated, right? So to
>> prevent vector exhaustion, the admin needs to reboot the machine and set
>> a command line parameter to limit this, right? As that parameter is not
>> documented the admin is going to dice a number. That's impractical and
>> just a horrible bandaid.
>
> OK, I think I should update the documents in the new version.

Updating documentation neither makes it more practical (it still
requires a reboot) nor does it justify the abuse of the msi_prepare()
callback.

The only reason why this hack "works" is that there is a historical
mechanism which tells the PCI/MSI core that the number of requested
vectors cannot be allocated, but that there would be $N vectors
possible. But even that return value has no guarantee.

This mechanism is ill defined and really should go away.

Adding yet another way to limit this via msi_prepare() is just
proliferating this ill defined mechanism and I have zero interest in
that.

Let's take a step back and look at the larger picture:

1) A PCI/MSI irqdomain is attached to a PCI bus

2) The number of PCI devices on that PCI bus is usually known at boot
time _before_ the first device driver is probed.

That's not entirely true for PCI hotplug devices, but that's hardly
relevant for an architecture which got designed less than 10 years
ago and the architects decided that 256 MSI vectors are good enough
for up to 256 CPUs. The concept of per CPU queues was already known
at that time, no?

So the irqdomain can tell the PCI/MSI core the maximum number of vectors
available for a particular bus, right?

The default, i.e if the irqdomain does not expose that information,
would be "unlimited", i.e. ULONG_MAX.

Now take that number and divide it by the number of devices on the bus
and you get at least a sensible limit which does not immediately cause
vector exhaustion.

That limit might be suboptimal if there are lots of other devices on
that bus which just require one or two vectors, but that's something
which can be optimized via a generic command line option or even a sysfs
mechanism.

Thanks,

tglx