Re: [PATCH] amd/iommu: do not split domain flushes when flushing the entire range

From: Weinan Liu

Date: Thu Apr 09 2026 - 04:15:45 EST

> On Thu, Mar 26, 2026 19:05:12 -0300 Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote:
> > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > >
> > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> > > > We are hitting the following soft lockup in production on v6.6 and
> > > > v6.12, but the bug exists in all versions
> > > >
> > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> > > > Call Trace:
> > > > <TASK>
> > > > amd_iommu_attach_device+0x69/0x450
> > > > __iommu_device_set_domain+0x7b/0x190
> > > > __iommu_group_set_core_domain+0x61/0xd0
> > > > iommu_detatch_group+0x27/0x40
> > > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> > > > vfio_group_detach_container+0x59/0x160 [vfio]
> > > > vfio_group_fops_release+0x4d/0x90 [vfio]
> > > > __fput+0x95/0x2a0
> > > > task_work_run+0x93/0xc0
> > > > do_exit+0x321/0x950
> > > > do_group_exit+0x7f/0xa0
> > > > get_signal_0x77d/0x780
> > > > </TASK>
> > > >
> > > > This occurs because we're a VM and we're splitting up the size
> > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
> > >
> > > This function doesn't exist in the upstream kernel anymore, and the
> > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
> > > all, AFAIK.
> >
> > This was based on linus/master as of March 4th, and we get here via
> > amd_iommu_flush_tlb_all, which definitely still exists, so what
> > specifically are you talking about? Thanks,
>
> $ git grep amd_iommu_domain_flush_tlb_pde | wc -l
> 0
>
> The entire page table logic was rewritten. The stuff that caused these
> issues is gone and the new stuff doesn't appear to have this bug of
> passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS.
>
> If it does please explain it in terms of the new stuff without
> referencing deleted functions.
>
> I don't know how you get something like this into -stable.

I believe the function Josef is referring to on linux/master is amd_iommu_domain_flush_all().
https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1820

The potential call sequence appears to be:
```
blocked_domain_attach_device() or amd_iommu_attach_device()
-> detach_device()
-> amd_iommu_domain_flush_all()
->amd_iommu_domain_flush_pages(...,
CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
```

Based on the code in build_inv_address()[1], it doesn't make sense to break
the entire cache size into smaller sizes to perform multiple flushes for a chunk size
larger than 1 << 51(full flush)

[1] https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1289