Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture

From: Robin Murphy
Date: Tue Mar 15 2016 - 08:34:06 EST

Hi Marek, Arnd,

On 19/02/16 10:30, Arnd Bergmann wrote:
On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
side-effect of this change is a switch from bitmap-based IO address space
management to tree-based code. There should be no functional changes
for drivers, which rely on initialization from generic arch_setup_dna_ops()
interface. Code, which used old arm_iommu_* functions must be updated to
new interface.

Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>

I like the overall idea. However, this interface from the iommu
subsystem into architecture specific code:

+ * The DMA API is built upon the notion of "buffer ownership". A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device. These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches. We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ */
+extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
+ enum dma_data_direction);
+extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
+ enum dma_data_direction);
+static inline void arch_flush_page(struct device *dev, const void *virt,
+ phys_addr_t phys)
+ dmac_flush_range(virt, virt + PAGE_SIZE);
+ outer_flush_range(phys, phys + PAGE_SIZE);
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
+ unsigned int offset = phys & ~PAGE_MASK;
+ __dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
+ unsigned int offset = phys & ~PAGE_MASK;
+ __dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+ pgprot_t prot, bool coherent)
+ if (coherent)
+ return prot;
+ prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+ pgprot_writecombine(prot) :
+ pgprot_dmacoherent(prot);
+ return prot;
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+ gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);

doesn't feel completely right yet. In particular the arch_flush_page()
interface is probably still too specific to ARM/ARM64 and won't work
that way on other architectures.

I think it would be better to do this either more generic, or less generic:

a) leave the iommu_dma_map_ops definition in the architecture specific
code, but make it call helper functions in the drivers/iommu to do all
of the really generic parts.

This was certainly the original intent of the arm64 code. The division of responsibility there is a conscious decision - IOMMU-API-wrangling goes in the common code, cache maintenance and actual dma_map_ops stay hidden in architecture-private code, safe from abuse. It's very much modelled on SWIOTLB.

Given all the work Russell did last year getting rid of direct uses of the dmac_* cache maintenance functions by ARM drivers, I don't think bringing all of that back is a good way to go - Personally I'd much rather see several dozen lines of very similar looking (other than highmem and outer cache stuff) arch-private code if it maintains a robust and clearly-defined abstraction (and avoids yet another level of indirection). It does also seem a little odd to factor out only half the file on the grounds of architectural similarity, when that argument applies equally to the other (non-IOMMU) half too. I think the recent tree-wide conversion to generic dma_map_ops was in part motivated by the thought of common implementations, so I'm sure that's something we can revisit in due course.


b) clarify that this is only applicable to arch/arm and arch/arm64, and
unify things further between these two, as they have very similar
requirements in the CPU architecture.

iommu mailing list