RE: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
From: Tian, Kevin
Date: Wed Jun 01 2022 - 04:53:26 EST
> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Wednesday, June 1, 2022 7:11 AM
>
> On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote:
>
> > There are only 3 instances where we'll free a table while the domain is
> > live. The first is the one legitimate race condition, where two map requests
> > targeting relatively nearby PTEs both go to fill in an intermediate level of
> > table; whoever loses that race frees the table they allocated, but it was
> > never visible to anyone else so that's definitely fine. The second is if
> > we're mapping a block entry, and find that there's already a table entry
> > there, wherein we assume the table must be empty, clear the entry,
> > invalidate any walk caches, install the block entry, then free the orphaned
> > table; since we're mapping the entire IOVA range covered by that table,
> > there should be no other operations on that IOVA range attempting to walk
> > the table at the same time, so it's fine.
>
> I saw these two in the Intel driver
>
> > The third is effectively the inverse, if we get a block-sized unmap
> > but find a table entry rather than a block at that point (on the
> > assumption that it's de-facto allowed for a single unmap to cover
> > multiple adjacent mappings as long as it does so exactly); similarly
> > we assume that the table must be full, and no other operations
> > should be racing because we're unmapping its whole IOVA range, so we
> > remove the table entry, invalidate, and free as before.
>
> Not sure I noticed this one though
>
> This all it all makes sense though.
Intel driver also does this. See dma_pte_clear_level():
/* If range covers entire pagetable, free it */
if (start_pfn <= level_pfn &&
last_pfn >= level_pfn + level_size(level) - 1) {
...