I've already done something experimental for the driver to manage the
affinity, and performance is generally much better:
https://github.com/hisilicon/kernel-dev/commit/e15bd404ed1086fed44da34ed3bd37a8433688a7
But I still think it's wise to only consider managed interrupts for now.
JFYI, about NVMe CPU lockup issue, there are 2 works on going here:I've also managed to trigger some of them now that I have access to
https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@xxxxxxxxxx/T/#t
https://lore.kernel.org/linux-block/20191218071942.22336-1-ming.lei@xxxxxxxxxx/T/#t
a decent box with nvme storage.
I only have 2x NVMe SSDs when this occurs - I should not be hitting this...
Out of curiosity, have you tried
with the SMMU disabled? I'm wondering whether we hit some livelock
condition on unmapping buffers...
No, but I can give it a try. Doing that should lower the CPU usage,
though, so maybe masks the issue - probably not.