Re: [PATCH 1/1] genirq/msi: Dynamic remove/add stroage adapter hits EEH

From: Wen Xiong
Date: Wed Mar 19 2025 - 22:59:15 EST


The real problem has nothing to do with a remove/add operation. The
problem is solely in the probe function.

Hi Thomas,
Thanks for your suggestion!

I don't think we have problems in probe function since this driver has been in productions for many many years.
Also we didn't see the issue before the "MSI domain" patchset dropping into linux interrupt code(no issue in rhel92 release).

Device reset is not called in probe function. We don't see the issue without dynamically remove/add operation.
There is a small window which irqbalance daemon kicks in during device reset. So it took about over 6 hours to recreate the issue when doing remove/add loop operation.

We can't find the good way to fix the issue in both of device drivers. So we look for some help in interrupt code.

Looks each irq_data has a state(IRQD_AFFINITY_MANAGED), Can we play this flag during the reset in device driver?

* IRQD_NO_BALANCING - Balancing disabled for this IRQ
* IRQD_AFFINITY_ON_ACTIVATE - Affinity is set on activation. Don't
call irq_chip::irq_set_affinity() when deactivated.

OR
If we registered an affinity notifier in device driver, can this tell us the msix vector has been clear to 0 when irqbalance daemon kicks in?

Thank you so much, I really appreciate any suggestion/help!
Wendy