Re: [PATCH kernel v9 18/32] powerpc/iommu/powernv: Release replaced TCE
From: David Gibson
Date: Thu Apr 30 2015 - 00:38:49 EST
On Wed, Apr 29, 2015 at 07:51:21PM +1000, Alexey Kardashevskiy wrote:
> On 04/29/2015 02:18 PM, David Gibson wrote:
> >On Sat, Apr 25, 2015 at 10:14:42PM +1000, Alexey Kardashevskiy wrote:
> >>At the moment writing new TCE value to the IOMMU table fails with EBUSY
> >>if there is a valid entry already. However PAPR specification allows
> >>the guest to write new TCE value without clearing it first.
> >>Another problem this patch is addressing is the use of pool locks for
> >>external IOMMU users such as VFIO. The pool locks are to protect
> >>DMA page allocator rather than entries and since the host kernel does
> >>not control what pages are in use, there is no point in pool locks and
> >>exchange()+put_page(oldtce) is sufficient to avoid possible races.
> >>This adds an exchange() callback to iommu_table_ops which does the same
> >>thing as set() plus it returns replaced TCE and DMA direction so
> >>the caller can release the pages afterwards. The exchange() receives
> >>a physical address unlike set() which receives linear mapping address;
> >>and returns a physical address as the clear() does.
> >>This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
> >>for a platform to have exchange() implemented in order to support VFIO.
> >>This replaces iommu_tce_build() and iommu_clear_tce() with
> >>a single iommu_tce_xchg().
> >>This makes sure that TCE permission bits are not set in TCE passed to
> >>IOMMU API as those are to be calculated by platform code from DMA direction.
> >>This moves SetPageDirty() to the IOMMU code to make it work for both
> >>VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
> >>available later).
> >>Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> >>[aw: for the vfio related changes]
> >>Acked-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> >This looks mostly good, but there are couple of details that need fixing.
> >>diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
> >>index ba75aa5..e8802ac 100644
> >>--- a/arch/powerpc/platforms/powernv/pci.c
> >>+++ b/arch/powerpc/platforms/powernv/pci.c
> >>@@ -598,6 +598,23 @@ int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
> >> return 0;
> >> }
> >>+#ifdef CONFIG_IOMMU_API
> >>+int pnv_tce_xchg(struct iommu_table *tbl, long index,
> >>+ unsigned long *tce, enum dma_data_direction *direction)
> >>+ u64 proto_tce = iommu_direction_to_tce_perm(*direction);
> >>+ unsigned long newtce = *tce | proto_tce;
> >>+ unsigned long idx = index - tbl->it_offset;
> >Should this have a BUG_ON or WARN_ON if the supplied tce has bits set
> >below the page mask?
> Why? The caller checks these bits, do we really need to duplicate it
Because this is the crunch point where bad bits will actually cause
strange stuff to happen.
As much as anything the point of a BUG_ON would be to document that
this function expects the parameter to be aligned.
> >>+ *tce = xchg(pnv_tce(tbl, idx), cpu_to_be64(newtce));
> >>+ *tce = be64_to_cpu(*tce);
> >>+ *direction = iommu_tce_direction(*tce);
> >>+ *tce &= ~(TCE_PCI_READ | TCE_PCI_WRITE);
> >>+ return 0;
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
Description: PGP signature