[RFC 0/6] iommu/dma: s390 DMA API conversion and optimized IOTLB flushing
From: Niklas Schnelle
Date: Wed Oct 19 2022 - 10:53:41 EST
Hi All,
This patch series converts s390's PCI support from its platform specific DMA
API implementation in arch/s390/pci/pci_dma.c to the common DMA IOMMU layer.
The conversion itself is done in patch 3 and after applying my previous s390
IOMMU series [0, 1] only touches the s390 IOMMU driver and arch code moving
over remaining functions from the s390 DMA code. No changes to common code are
necessary.
After patch 3 the conversion is complete in principle and on our partitioning
machine hypervisor LPAR performance matches or exceeds the existing code. When
running under z/VM or KVM however, performance plummets to about half of the
existing code due to a much higher rate of IOTLB flushes for unmapped pages.
Due to the hypervisors use of IOTLB flushes to synchronize their shadow tables
these are very expensive and minimizing flushes is very important.
To counter this performance drop patches 4-5 propose a new, single queue, IOTLB
flushing scheme as an alternative to the existing per-CPU flush queues.
Introducing an alternative scheme was also suggested by Robin Murphy[2]. The
single queue allows batching a much larger number of lazily freed IOVAs and was
also chosen as hypervisors tend to serialize IOTLB flushes removing some of the
gains of multiple queues. To give some bound on when IOVAs are flushed
a timeout of 1 second is used and postponed when a flush happens. In my tests
this scheme brought performance under z/VM and KVM on par with the existing
code.
As it is implemented in common code this IOTLB flushing scheme can of course be
used by other platforms with expensive IOTLB flushes. Particularly virtio-iommu
might be a candidate. With this series however only s390 systems that require
IOTLB flushes on map default to it while LPAR uses the per-CPU queues.
I did verify that the new scheme does work on my x86 Ryzen workstation by
locally modifying drivers/iommu/iommu.c:iommu_subsys_init() to default to it
and verifying its use via "cat /sys/bus/pci/devices/*/iommu_group/type". I did
not find problems with an AMD GPU, Intel NIC (with SR-IOV), NVMes or any on
board peripherals though I did not perform any meaningful performance tests.
Another complication is that on z/VM and KVM our IOTLB flushes can return an
error indicating that the hypervisor has run out of IOVAs and needs us to flush
the queue before new mappings can be created. In strict mode this of course
doesn't happen but with queuing we need to handle it. This is done in patch 6.
As with previous series this is available via my git.kernel.org tree[3] in the
dma_iommu_v1 branch with s390_dma_iommu_v1 tag.
Best regards,
Niklas
[0] https://lore.kernel.org/linux-iommu/20221017124558.1386337-1-schnelle@xxxxxxxxxxxxx/
[1] https://lore.kernel.org/linux-iommu/20221018145132.998866-1-schnelle@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-iommu/3e402947-61f9-b7e8-1414-fde006257b6f@xxxxxxx/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/
Niklas Schnelle (6):
s390/ism: Set DMA coherent mask
s390/pci: prepare is_passed_through() for dma-iommu
s390/pci: Use dma-iommu layer
iommu/dma: Prepare for multiple flush queue implementations
iommu/dma: Add simple batching flush queue implementation
iommu/s390: flush queued IOVAs on RPCIT out of resource indication
.../admin-guide/kernel-parameters.txt | 9 +-
arch/s390/include/asm/pci.h | 7 -
arch/s390/include/asm/pci_dma.h | 120 +--
arch/s390/pci/Makefile | 2 +-
arch/s390/pci/pci.c | 22 +-
arch/s390/pci/pci_bus.c | 5 -
arch/s390/pci/pci_debug.c | 13 +-
arch/s390/pci/pci_dma.c | 732 ------------------
arch/s390/pci/pci_event.c | 17 +-
arch/s390/pci/pci_sysfs.c | 19 +-
drivers/iommu/Kconfig | 3 +-
drivers/iommu/dma-iommu.c | 307 ++++++--
drivers/iommu/dma-iommu.h | 1 +
drivers/iommu/iommu.c | 19 +-
drivers/iommu/s390-iommu.c | 415 +++++++++-
drivers/s390/net/ism_drv.c | 4 +
include/linux/iommu.h | 6 +
17 files changed, 707 insertions(+), 994 deletions(-)
delete mode 100644 arch/s390/pci/pci_dma.c
--
2.34.1