RE: [EXTERNAL] Re: [PATCH net-next v6 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
From: Long Li
Date: Mon May 04 2026 - 18:22:16 EST
> -----Original Message-----
> From: Simon Horman <horms@xxxxxxxxxx>
> Sent: Saturday, May 2, 2026 8:09 AM
> To: Long Li <longli@xxxxxxxxxxxxx>
> Cc: 'Simon Horman' <horms@xxxxxxxxxx>; Konstantin Taranov
> <kotaranov@xxxxxxxxxxxxx>; kuba@xxxxxxxxxx; davem@xxxxxxxxxxxxx;
> pabeni@xxxxxxxxxx; edumazet@xxxxxxxxxx; andrew+netdev@xxxxxxx;
> jgg@xxxxxxxx; leon@xxxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; KY
> Srinivasan <kys@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> <DECUI@xxxxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
> linux-hyperv@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: [EXTERNAL] Re: [PATCH net-next v6 2/6] net: mana: Query device
> capabilities and configure MSI-X sharing for EQs
>
> From: 'Simon Horman' <horms@xxxxxxxxxx>
>
> This is an AI-generated review of your patch. The human sending this email has
> considered the AI review valid, or at least plausible.
> Full review at:
> https://sashiko.d/
> ev%2F&data=05%7C02%7Clongli%40microsoft.com%7C1b8423bb5d15490288be
> 08dea85cb944%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6391333
> 13365211100%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> %7C%7C%7C&sdata=H6rPcEM62LmbWsXdpSVcVaaec3Wx6KJnTnYS0fBk6vY%3D
> &reserved=0
> ---
> net: mana: Query device capabilities and configure MSI-X sharing for EQs
>
> When querying the device, this adjusts the max number of queues to allow
> dedicated MSI-X vectors for each vPort, clamping to no less than
> MANA_DEF_NUM_QUEUES. MSI-X sharing among vPorts is disabled by default
> and enabled only when there are not enough vectors. It also renames
> mana_query_device_cfg() to mana_gd_query_device_cfg().
>
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>
> [ ... ]
>
> > + /*
> > + * Adjust gc->max_num_queues returned from the SOC to allow
> dedicated
> > + * MSIx for each vPort. Clamp to no less than
> MANA_DEF_NUM_QUEUES.
> > + */
> > + max_num_queues = (gc->num_msix_usable - 1) / num_ports;
> > + max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
>
> Does rounding up here inflate max_num_queues and unnecessarily force MSI-X
> sharing?
>
> For example, if there are 63 usable MSIs and 2 ports, the division yields 31.
> Rounding up gives 32. The subsequent check below:
>
> > + /*
> > + * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
> > + * Ethernet EQs when (max_num_queues * num_ports >
> num_msix_usable - 1)
> > + */
> > + max_num_queues = min(gc->max_num_queues, max_num_queues);
> > + if (max_num_queues * num_ports > gc->num_msix_usable - 1)
> > + gc->msi_sharing = true;
>
> would then evaluate to true (32 * 2 > 63) and force the driver into MSI-X shared
> mode. This seems to contradict the intent to use dedicated MSI-X whenever
> possible.
>
> Would it be better to use rounddown_pow_of_two() instead to ensure the
> calculated queues fit within the available dedicated vectors?
The roundup is intentional. The primary goal is to maximize the number of queues per vPort for throughput - power-of-2 queue counts align with RSS indirection table sizes and give the best
distribution. Dedicated MSI-X is preferred when it comes for free, but not at the cost of halving the queue count.
In your example (63 MSIs, 2 ports), rounddown would give 16 queues per port - nearly half the 31 that the hardware could support. Roundup gives 32 and triggers sharing mode, but sharing mode is
designed to work correctly and the queues still get serviced. The driver trades per-EQ interrupt exclusivity for better parallelism, which is the right tradeoff for network throughput on these
devices.
The sharing check exists as a fallback for exactly this situation - it is not a failure mode but a deliberate operating point.
>
> [ ... ]
>
> > @@ -1968,20 +2008,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
> > if (err)
> > goto destroy_hwc;
> >
> > - err = mana_gd_query_max_resources(pdev);
> > + err = mana_gd_detect_devices(pdev);
> > if (err)
> > goto destroy_hwc;
> >
> > - err = mana_gd_setup_remaining_irqs(pdev);
> > - if (err) {
> > - dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
> > - goto destroy_hwc;
> > - }
> > -
> > - err = mana_gd_detect_devices(pdev);
> > + err = mana_gd_query_max_resources(pdev);
> > if (err)
> > goto destroy_hwc;
> >
> > + if (!gc->msi_sharing) {
> > + gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable,
> GFP_KERNEL);
> > + if (!gc->msi_bitmap) {
> > + err = -ENOMEM;
> > + goto destroy_hwc;
> > + }
> > + /* Set bit for HWC */
> > + set_bit(0, gc->msi_bitmap);
> > + } else {
> > + err = mana_gd_setup_remaining_irqs(pdev);
> > + if (err) {
> > + dev_err(gc->dev, "Failed to setup remaining IRQs: %d",
> err);
> > + goto destroy_hwc;
> > + }
> > + }
>
> If msi_sharing is disabled, we allocate the msi_bitmap but skip calling
> mana_gd_setup_remaining_irqs().
>
> Since mana_gd_setup_hwc_irqs() only allocates a single vector for the hardware
> channel when dynamic allocation is supported, does this leave the device without
> interrupts for its Ethernet queues?
>
> If so, it seems this could lead to queue creation failures when the driver attempts
> to map uninitialized vectors. I notice this is fixed in a later patch in the series
> ("net: mana: Allocate interrupt context for each EQ when creating vPort"), but
> does leaving it out here break bisectability?
You're right that with pci_msix_can_alloc_dyn() returning true on MANA, the non-sharing path is reachable between patches 2 and 5. In that window, mana_gd_register_irq() would fail with -EINVAL
because irq_contexts for indices 1+ haven't been populated yet.
However, the failure is contained: mana_create_eq() returns an error, mana_alloc_queues() propagates it, and the interface fails to come up cleanly - no WARN, no crash, no data corruption. The
driver remains in a consistent state and succeeds once the full series is applied.
This is a new capability being built up across the series. The dedicated MSI-X mode did not exist before, so there is no regression from the pre-patch baseline - the pre-patch code always went
through mana_gd_setup_remaining_irqs() and operated in what is now called sharing mode. Restructuring the series to make non-sharing mode functional at each intermediate commit would require
squashing the GIC infrastructure (patches 3-4) into this patch, producing a single large change that is significantly harder to review.
I'd prefer to keep the logical separation as-is. If you feel strongly about strict bisectability, I could add a fallback in this patch that forces msi_sharing = true when the GIC allocator is
not yet available, and have patch 5 remove it - but that adds throwaway code to an intermediate commit.