Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

From: Jason Gunthorpe
Date: Wed Sep 22 2021 - 13:04:42 EST


On Tue, Sep 21, 2021 at 01:29:34PM -0700, Jacob Pan wrote:
> Hi Joerg/Jason/Christoph et all,
>
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and other
> differences between user and kernel SVA are handled as special cases.
>
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@xxxxxxxxxx/)
> There is also the lack of IOTLB synchronization upon kernel page table updates.
>
> This patchset is trying to address these concerns by having an explicit DMA
> API compatible model while continue to support in-kernel use of DMA requests
> with PASID. Specifically, the following DMA-IOMMU APIs are introduced:
>
> int iommu_dma_pasid_enable/disable(struct device *dev,
> struct iommu_domain **domain,
> enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
> void *cpu_addr,size_t size, int prot);

I'm not convinced this is going in the right direction..

You should create/find a 'struct device' for the PASID and use the
normal DMA API, not try to create a parallel DMA API under the iommu
framework.

Again, there should no driver in Linux doing DMA without going through
the normal DMA API.

> The following three addressing modes are supported with example API usages
> by device drivers.
>
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
> can DMA pass through IOMMU on a per PASID basis.
> Example:
> pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
> /* Use the returning PASID and PA for work submission */

And why should this even be a choice given to drivers?

Drivers do not get to self declare their "trustiness" - this is only
set by the admin.

PASID tagged DMA is no different than any other DMA and needs to
follow the global admin set IOMMU modes - without any driver knob to
change behaviors.

The API design should look more like this:

u32 hw_pasid;
struct device *pasid_dev = iommu_get_pasid_device_handle(pci_device, &hw_pasid);
dma_addr_t addr = dma_map_XX(pasid_dev, buf, size)

'tell HW to do DMA'(hw_pasid, addr, size)

dma_unmap_XX(pasid_dev, addr, size);

If there is any performance tunable around how the IO page table is
consutrcted then the IOMMU layer will handle it transparently from
global config, just as it does for every other DMA out there.

> 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
> result of module loading/eBPF load. Adding kernel mmu notifier?

I'm deeply skeptical we should even have "KSVA" and would want to see
a lot of performance justification to introduce something like
this.

Given that basically only valloc memory could truely benefit from it,
I don't expect to see much win, especially when balanced with
burdening all valloc users with an IO page table synchronization.

Certainly it should not be part of a patch series fixing kPASID
support for basic DMA, and still doesn't excuse skpping the DMA API -
that is still mandatory for portability to support cache flushing.

Jason