Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH

From: Wen Xiong
Date: Thu Mar 27 2025 - 17:37:14 EST




What about tearing down resources first and then issuing the reset?

This SAS adapter supports dual controller configuration. Normally we have two adapters in a system.
We config one of them as Primary adapter and another one as Secondary adapter.
When doing remove operation on primary adapter, the Secondary adapter is going to be failover and config as primary by adapter firmware. During failover process, adapter firmware requests the secondary adapter reset, then sets it as primary adapter.

Secondary adapter failover triggers adapter reset(ipr_reset_get_unit_check_job()).

[ 940.742698] ipr 0206:a0:00.0: 9070: IOA requested reset -> FW requested
[ 940.742733] ipr 0206:a0:00.0: Adapter to Adapter Link Failed Due to SAS Fabric Change [PRC: 17101C25]
[ 940.742768] ipr 0206:a0:00.0: Remote IOA VPID/SN: IBM 57B4001SISIOA 00458021

When secondary adapter doing a reset, we use the same code path as removing operation. We can’t free irqs for Secondary adapter since kernel has assigned the irqs for Secondary adapter.

Actually we discussed about "calling pci_free_irq_vectors()" before doing bist reset when we trying to fix in device driver.

That might cause other problems. It is also not what a user would expect. For example, if they disabled irq balance and manually setup irq binding and affinity, if we go and free and reallocate the interrupts across a reset, this would wipe out those changes, which would not be expected.

Thanks,
Wendy