Re: MSIs not freed in GICv3 ITS driver

From: Marc Zyngier
Date: Mon Jul 08 2024 - 13:32:16 EST


Mani,

On Mon, 08 Jul 2024 16:39:33 +0100,
Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> wrote:
>
> Hi Marc, Thomas,
>
> I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from
> PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to
> allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds
> the MSI count to power of 2 to find the order. In this case, the order becomes 2
> in its_alloc_device_irq().

That's because we can only allocate EventIDs as a number of ID
bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or
2^24. This is a power-of-two architecture.

> So 4 entries are allocated by bitmap_find_free_region().

Assuming you're calling about its_alloc_device_irq(), it looks like a
bug. Or rather, some laziness on my part. The thing is, this bitmap is
only dealing with sub-allocation in the pool that has been given to
the endpoint. So the power-of-two crap doesn't really matter unless
you are dealing with Multi-MSI, which has actual alignment
requirements.

>
> But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc()
> will only allocate 3 MSIs, leaving one bitmap entry unused.
>
> And when the driver frees the MSIs using pci_free_irq_vectors(), only 3
> allocated MSIs were freed and their bitmap entries were also released. But the
> entry for the additional bitmap was never released. Due to this,
> its_free_device() was also never called, resulting in the ITS device not getting
> freed.
>
> So when the PCIe driver tries to request the MSIs again (PCIe device being
> removed and inserted back), because the ITS device was not freed previously,
> MSIs were again requested for the same ITS device. And due to the stale bitmap
> entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were
> available. This forces the PCIe driver to reduce the MSI count, which is sub
> optimal.
>
> This behavior might be applicable to other irqchip drivers handling MSI as well.
> I want to know if this behavior is already known with MSI and irqchip drivers?
>
> For fixing this issue, the PCIe drivers could always request MSIs of power of 2,
> and use a dummy MSI handler for the extra number of MSIs allocated. This could
> also be done in the generic MSI driver itself to avoid changes in the PCIe
> drivers. But I wouldn't say it is the best possible fix.

No, that's terrible. This is just papering over a design mistake, and
I refuse to go down that road.

>
> Is there any other way to address this issue? Or am I missing something
> completely?

Well, since each endpoint handled by an ITS has its allocation tracked
by a bitmap, it makes more sense to precisely track the allocation.

Here's a quick hack that managed to survive a VM boot. It may even
work. The only problem with it is that it probably breaks a Multi-MSi
device sitting behind a non-transparent bridge that would get its MSIs
allocated after another device. In this case, we wouldn't honor the
required alignment and things would break.

So take this as a proof of concept. If that works, I'll think of how
to deal with this crap in a more suitable way...

M.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 3c755d5dad6e6..43479c9e7f8d2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev)

static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq)
{
- int idx;
+ unsigned long idx;

/* Find a free LPI region in lpi_map and allocate them. */
- idx = bitmap_find_free_region(dev->event_map.lpi_map,
- dev->event_map.nr_lpis,
- get_count_order(nvecs));
- if (idx < 0)
+ idx = bitmap_find_next_zero_area(dev->event_map.lpi_map,
+ dev->event_map.nr_lpis, 0, nvecs, 0);
+ if (idx >= dev->event_map.nr_lpis)
return -ENOSPC;

+ bitmap_set(dev->event_map.lpi_map, idx, nvecs);
+
*hwirq = dev->event_map.lpi_base + idx;

return 0;
@@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
struct its_node *its = its_dev->its;
int i;

- bitmap_release_region(its_dev->event_map.lpi_map,
- its_get_event_id(irq_domain_get_irq_data(domain, virq)),
- get_count_order(nr_irqs));
+ bitmap_clear(its_dev->event_map.lpi_map,
+ its_get_event_id(irq_domain_get_irq_data(domain, virq)),
+ nr_irqs);

for (i = 0; i < nr_irqs; i++) {
struct irq_data *data = irq_domain_get_irq_data(domain,

--
Without deviation from the norm, progress is not possible.