Re: [PATCH 4/5] vfio/type1: Flush CPU caches on DMA pages in non-coherent domains

From: Yan Zhao
Date: Fri May 17 2024 - 01:01:47 EST


On Thu, May 16, 2024 at 10:44:42PM -0600, Alex Williamson wrote:
> On Fri, 17 May 2024 11:11:48 +0800
> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
>
> > On Thu, May 16, 2024 at 02:50:09PM -0600, Alex Williamson wrote:
> > > On Mon, 13 May 2024 15:11:28 +0800
> > > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > >
> > > > On Fri, May 10, 2024 at 10:57:28AM -0600, Alex Williamson wrote:
> > > > > On Fri, 10 May 2024 18:31:13 +0800
> > > > > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > > > >
> > > > > > On Thu, May 09, 2024 at 12:10:49PM -0600, Alex Williamson wrote:
> > > > > > > On Tue, 7 May 2024 14:21:38 +0800
> > > > > > > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > ...
> > > > > > > > @@ -1683,9 +1715,14 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
> > > > > > > > for (; n; n = rb_next(n)) {
> > > > > > > > struct vfio_dma *dma;
> > > > > > > > dma_addr_t iova;
> > > > > > > > + bool cache_flush_required;
> > > > > > > >
> > > > > > > > dma = rb_entry(n, struct vfio_dma, node);
> > > > > > > > iova = dma->iova;
> > > > > > > > + cache_flush_required = !domain->enforce_cache_coherency &&
> > > > > > > > + !dma->cache_flush_required;
> > > > > > > > + if (cache_flush_required)
> > > > > > > > + dma->cache_flush_required = true;
> > > > > > >
> > > > > > > The variable name here isn't accurate and the logic is confusing. If
> > > > > > > the domain does not enforce coherency and the mapping is not tagged as
> > > > > > > requiring a cache flush, then we need to mark the mapping as requiring
> > > > > > > a cache flush. So the variable state is something more akin to
> > > > > > > set_cache_flush_required. But all we're saving with this is a
> > > > > > > redundant set if the mapping is already tagged as requiring a cache
> > > > > > > flush, so it could really be simplified to:
> > > > > > >
> > > > > > > dma->cache_flush_required = !domain->enforce_cache_coherency;
> > > > > > Sorry about the confusion.
> > > > > >
> > > > > > If dma->cache_flush_required is set to true by a domain not enforcing cache
> > > > > > coherency, we hope it will not be reset to false by a later attaching to domain
> > > > > > enforcing cache coherency due to the lazily flushing design.
> > > > >
> > > > > Right, ok, the vfio_dma objects are shared between domains so we never
> > > > > want to set 'dma->cache_flush_required = false' due to the addition of a
> > > > > 'domain->enforce_cache_coherent == true'. So this could be:
> > > > >
> > > > > if (!dma->cache_flush_required)
> > > > > dma->cache_flush_required = !domain->enforce_cache_coherency;
> > > >
> > > > Though this code is easier for understanding, it leads to unnecessary setting of
> > > > dma->cache_flush_required to false, given domain->enforce_cache_coherency is
> > > > true at the most time.
> > >
> > > I don't really see that as an issue, but the variable name originally
> > > chosen above, cache_flush_required, also doesn't convey that it's only
> > > attempting to set the value if it wasn't previously set and is now
> > > required by a noncoherent domain.
> > Agreed, the old name is too vague.
> > What about update_to_noncoherent_required?
>
> set_noncoherent? Thanks,
>
Concise!

>
> > Then in vfio_iommu_replay(), it's like
> >
> > update_to_noncoherent_required = !domain->enforce_cache_coherency && !dma->is_noncoherent;
> > if (update_to_noncoherent_required)
> > dma->is_noncoherent = true;
> >
> > ...
> > if (update_to_noncoherent_required)
> > arch_flush_cache_phys((phys, size);
> > >
> > > > > > > It might add more clarity to just name the mapping flag
> > > > > > > dma->mapped_noncoherent.
> > > > > >
> > > > > > The dma->cache_flush_required is to mark whether pages in a vfio_dma requires
> > > > > > cache flush in the subsequence mapping into the first non-coherent domain
> > > > > > and page unpinning.
> > > > >
> > > > > How do we arrive at a sequence where we have dma->cache_flush_required
> > > > > that isn't the result of being mapped into a domain with
> > > > > !domain->enforce_cache_coherency?
> > > > Hmm, dma->cache_flush_required IS the result of being mapped into a domain with
> > > > !domain->enforce_cache_coherency.
> > > > My concern only arrives from the actual code sequence, i.e.
> > > > dma->cache_flush_required is set to true before the actual mapping.
> > > >
> > > > If we rename it to dma->mapped_noncoherent and only set it to true after the
> > > > actual successful mapping, it would lead to more code to handle flushing for the
> > > > unwind case.
> > > > Currently, flush for unwind is handled centrally in vfio_unpin_pages_remote()
> > > > by checking dma->cache_flush_required, which is true even before a full
> > > > successful mapping, so we won't miss flush on any pages that are mapped into a
> > > > non-coherent domain in a short window.
> > >
> > > I don't think we need to be so literal that "mapped_noncoherent" can
> > > only be set after the vfio_dma is fully mapped to a noncoherent domain,
> > > but also we can come up with other names for the flag. Perhaps
> > > "is_noncoherent". My suggestion was more from the perspective of what
> > > does the flag represent rather than what we intend to do as a result of
> > > the flag being set. Thanks,
> > Makes sense!
> > I like the name "is_noncoherent" :)
> >
>