Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support

From: Jon Hunter

Date: Mon Jan 26 2026 - 17:07:15 EST


Hi Thomas,

On 26/01/2026 07:59, Thomas Gleixner wrote:
On Thu, Jan 22 2026 at 18:31, Radu Rendec wrote:
The CPUs are taken offline one by one, starting with CPU 7. The code in
question runs on the dying CPU, and with hardware interrupts disabled
on all CPUs. The (simplified) call stack looks like this:

irq_migrate_all_off_this_cpu
for_each_active_irq
migrate_one_irq
irq_do_set_affinity
irq_chip_redirect_set_affinity (via chip->irq_set_affinity)

The debug patch I gave you adds:
* a printk to irq_chip_redirect_set_affinity (which is very small)
* a printk at the beginning of migrate_one_irq

Also, the call to irq_do_set_affinity is almost the last thing that
happens in migrate_one_irq, and that for_each_active_irq loop is quite
small too. So, there isn't much happening between the printk in
irq_chip_redirect_set_affinity for the msi irq (which we do see in the
log) and the printk in migrate_one_irq for the next irq (which we don't
see).

This doesn't make any sense at all. irq_chip_redirect_set_affinity() is
only accessing interrupt descriptor associated memory and the new
redirection CPU is the same as the previous one as the mask changes from
0xff to 0x7f and therefore cpumask_first() yields 0 in both cases.

According to the provided dmesg, this happens on linux-next.

Jon, can you please validate that this happens as well on

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/msi


I tried this branch and I see suspend failing with that branch too. If I revert this change on top of your branch or -next, I don't see any problems.

Thanks
Jon

--
nvpublic