Re: [PATCH] amd/iommu: do not split domain flushes when flushing the entire range
From: Jason Gunthorpe
Date: Thu Mar 12 2026 - 09:44:33 EST
On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> We are hitting the following soft lockup in production on v6.6 and
> v6.12, but the bug exists in all versions
>
> watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> Call Trace:
> <TASK>
> amd_iommu_attach_device+0x69/0x450
> __iommu_device_set_domain+0x7b/0x190
> __iommu_group_set_core_domain+0x61/0xd0
> iommu_detatch_group+0x27/0x40
> vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> vfio_group_detach_container+0x59/0x160 [vfio]
> vfio_group_fops_release+0x4d/0x90 [vfio]
> __fput+0x95/0x2a0
> task_work_run+0x93/0xc0
> do_exit+0x321/0x950
> do_group_exit+0x7f/0xa0
> get_signal_0x77d/0x780
> </TASK>
>
> This occurs because we're a VM and we're splitting up the size
> CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
This function doesn't exist in the upstream kernel anymore, and the
new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
all, AFAIK.
Your patch makes sense, but it needs to go to stable only somehow.
Jason