Re: [PATCH v3 3/3] PCI: dwc: Enable MSI affinity support

From: Radu Rendec

Date: Mon May 25 2026 - 12:57:41 EST

Hi Brian,

On Fri, 2026-05-22 at 17:07 -0700, Brian Norris wrote:
> (Updating Radu's email; dropping another bouncing email)

Thanks for doing that! Obviously, I no longer have access to the email
address that was used to post the patch, and I was lazy in setting up
scripts that follow the mailing lists to catch messages that are
addressed to me directly but using old email addresses.

> On Fri, May 22, 2026 at 01:27:43PM -0700, Brian Norris wrote:
> > I'll see if I can learn anything more here on my own, but I figured I'd
> > report it in case you have any thoughts or leads I should investigate.

Thanks for reporting it! I do not have any thoughts or leads yet, but I
do plan to look at it during the next few days and hopefully come up
with something. I also apologize for the slowness in my replies.

> In an hour or two of poking, all I've learned so far is that the problem
> also seems to go away if I:
>
> (a) add a few dump_stack() and other noisy logs to a few key places (for
>     now, __pci_write_msi_msg(), pci_power_up() failures, and
>     irq_chip_redirect_set_affinity() -- I think __pci_write_msi_msg()
>     was the most significant, possibly because it produced the most log
>     text) and
>
> (b) leave a 115200 baud UART kernel console running.
>
> (This is on a sample size of 20+ suspend cycles, whereas previous
> bisection would fail 100%.)
>
> It then reappers when I quiet the kernel logging a bit with `dmesg -n3`.
>
> I think that simply tells me that there's some timing issue or race
> condition involved.

That's very useful! Interrupts are migrated on suspend to the main CPU
and then migrated back on resume, and the ordering and synchronization
around that is tricky. The stack trace in your previous message tells
me that the nvme driver is waiting for IO completion, which is normally
signaled by an interrupt, except that interrupt never arrives.

With my patch included, the demultiplexed interrupt (the nvme interrupt
in this case) has an opportunity to be migrated during suspend/resume,
whereas previously it did not. That's one more moving part, and I'll
have to look closer at the code and think what could go wrong. I agree
it's likely a race condition or a timing issue because it works with
that extra logging, which adds small delays as a side effect.

Regards,
Radu