[PATCH] amd/iommu: do not split domain flushes when flushing the entire range

From: Josef Bacik

Date: Wed Mar 04 2026 - 16:31:02 EST


We are hitting the following soft lockup in production on v6.6 and
v6.12, but the bug exists in all versions

watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
Call Trace:
<TASK>
amd_iommu_attach_device+0x69/0x450
__iommu_device_set_domain+0x7b/0x190
__iommu_group_set_core_domain+0x61/0xd0
iommu_detatch_group+0x27/0x40
vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
vfio_group_detach_container+0x59/0x160 [vfio]
vfio_group_fops_release+0x4d/0x90 [vfio]
__fput+0x95/0x2a0
task_work_run+0x93/0xc0
do_exit+0x321/0x950
do_group_exit+0x7f/0xa0
get_signal_0x77d/0x780
</TASK>

This occurs because we're a VM and we're splitting up the size
CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. These
trap into the host on each flush, all while holding the domain lock with
IRQs disabled.

Fix this by not splitting up this special size amount and sending the
whole command in, so perhaps the host will decide to be gracious and not
spend 7 business years to do a flush.

cc: stable@xxxxxxxxxxxxxxx
Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM")
Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
---
drivers/iommu/amd/iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 81c4d7733872..f0d3e06734ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1769,7 +1769,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
{
lockdep_assert_held(&domain->lock);

- if (likely(!amd_iommu_np_cache)) {
+ if (likely(!amd_iommu_np_cache) ||
+ size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS) {
__domain_flush_pages(domain, address, size);

/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
--
2.53.0