Re: [RFC V1 RESEND 2/6] PCI/MSI: Dynamic allocation of MSI-X vectors by group

From: Jacob Keller
Date: Thu Jan 07 2021 - 17:31:30 EST


Hi Thomas,

On 6/29/2019 12:59 AM, Thomas Gleixner wrote:
> As already pointed out, that's overengineered.
>
> First of all this patch is doing too many things at once. These changes
> need to be split up in reviewable bits and pieces.
>

I am looking into this work as I want to implement ability to do grouped
partial allocations of MSI-X vectors over time in the ice Linux NIC driver.

> But I consider this approach as a POC and not something which can be meant
> as a maintainable solution. It just duct tapes this new functionality into
> the existing code thereby breaking things left and right. And even if you
> can 'fix' these issues with more duct tape it won't be maintainable at all.
>
> If you want to support group based allocations, then the PCI/MSI facility
> has to be refactored from ground up.
>

I agree that this is the right direction to go, but I am having some
trouble with following these steps when I started trying to implement
this stuff.

> 1) Introduce the concept of groups by adding a group list head to struct
> pci_dev. Ideally you create a new struct pci_dev_msi or whatever where
> all this muck goes into.
>

So my big problem I keep running into is that struct msi_desc is used by
several code paths that aren't PCI. It looks a bit odd trying to
refactor things to support groups for the non-PCI bus code that uses
struct msi_desc...

I'd appreciate any further thoughts you have on the right way to go
forward here. I think that treated vector allocations as groups is a
huge improvement, as it will make it easier to manage allocating MSI-X
vectors without running into exhaustion issues due to over allocating.

But does this need to be introduced as part of the generic linux/msi.h
stuff? Doing this means refactoring a bunch of code paths which don't
seem to care about grouping. But I can't find a good way to handle this
grouping in just the PCI layer.

> 2) Change the existing code to treat the current allocation mode as a
> group allocation. Keep the entries in a new struct msi_entry_group and
> have a group id, list head and the entries in there.
>
> Take care of protecting the group list.
>
> Assign group id 0 and add the entry_group to the list in the pci device.
>
> Rework the related functions so they are group aware.
>
> This can be split into preparatory and functional pieces, i.e. multiple
> patches.
>

The locking issue here also seems somewhat problematic. A lot of paths
that access the msi list don't seem to take a lock today. So any change
that affects these users would force adding locks on all these flows.

I guess for PCI code we could just stop using dev->msi_list altogether,
and instead use a PCI specific struct pci_msi_group or something? This
would mean that any flow that the PCI layer needs would have to take the
group structure instead of or in addition to the device pointer... It's
not clear how much code actually crosses between the PCI and non-PCI
usages of struct msi_desc...

> 3) Split out the 'first time' enablement code into separate functions and
> store the relevant state in struct pci_dev_msi
>
> 4) Rename pci_alloc_irq_vectors_affinity() to
> pci_alloc_irq_vectors_affinity_group() and add a group_id pointer
> argument.
>
> Make pci_alloc_irq_vectors_affinity() a wrapper function which hands
> in a NULL group id pointer and does proper sanity checking.
>
> 5) Create similar constructs for related functionality
>
> 6) Enable the group allocation mode for subsequent allocations
>

The rest of the flow makes sense, but I have been struggling with
finding the right abstraction for handling the msi_desc groups.

Does my idea of separating the PCI layer code to using its own structure
(and iterators I guess?) instead of relying on msi_list make sense? I
guess other code could be converted to groups as well, but I have been
trying to find a good path forward that has minimal chance of breaking
other users...

I'd appreciate any insight here.

Thanks,
Jake

> Thanks,
>
> tglx
>
>
>
>
>