Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity

From: John Garry
Date: Thu Jul 10 2025 - 04:13:10 EST


On 10/07/2025 07:49, Sagar.Biradar@xxxxxxxxxxxxx wrote:

+ Daniel, hch

This patch fixes a bug in the original path that caused I/O hangs. The
I/O hangs were because of an MSIx vector not having a mapped online CPU
upon receiving completion.

This patch enables Multi-Q support in the aacriad driver. Multi-Q support
in the driver is needed to support CPU offlining.
I assume that you mean "safe" CPU offlining.

It seems to me that in all cases we use queue interrupt affinity
spreading and managed interrupts for MSIX, right?

See aac_define_int_mode() -> pci_alloc_irq_vectors(..., PCI_IRQ_MSIX |
PCI_IRQ_AFFINITY);

But then for this non- Multi-Q support, the queue seems to be chosen
based on a round-robin approach in the driver. That round-robin comes
from how fib.vector_no is assigned in aac_fib_vector_assign(). If this
is the case, then why are managed interrupts being used for this non-
Multi-Q support at all?

I may be wrong about this. That driver is hard to understand with so
many knobs.


Thank you very much for raising this — you're right that using PCI_IRQ_AFFINITY in non-MultiQ mode doesn’t offer real value,
> since the driver doesn’t utilize the affinity mapping.> That said, the current implementation is functionally correct and consistent with the driver’s historical behavior.

For some time now this driver has been having issues in this area, so it is strange to say that behavior is correct.

Indeed, this patch is trying to fix the broken behaviour Re. CPU hotplug, right?

To keep the patch focused and avoid scope creep, I’d prefer to leave the affinity flag logic as is for now.

To me, first it would be good to stop using PCI_IRQ_AFFINITY - that should fix any broken behaviour regarding CPU hotplug.

Then, if you still want to use PCI_IRQ_AFFINITY, then do it like it is done in this patch.


I’d be happy to follow up with a small cleanup patch, sometime in future, to improve this if you think it would help.

Daniel (cc'ed) is working on a method to isolate CPUs so that CPUs using for queue mapping can be configured for tuning performance, so that you don't need Kconfig options like in this patch.