Re: [tip: x86/urgent] x86/dma: Tear down DMA ops on driver unbind

From: Borislav Petkov
Date: Sat Apr 17 2021 - 08:06:57 EST


On Thu, Apr 15, 2021 at 09:00:57AM -0000, tip-bot2 for Jean-Philippe Brucker wrote:
> The following commit has been merged into the x86/urgent branch of tip:
>
> Commit-ID: 9f8614f5567eb4e38579422d38a1bdfeeb648ffc
> Gitweb: https://git.kernel.org/tip/9f8614f5567eb4e38579422d38a1bdfeeb648ffc
> Author: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> AuthorDate: Wed, 14 Apr 2021 10:26:34 +02:00
> Committer: Borislav Petkov <bp@xxxxxxx>
> CommitterDate: Thu, 15 Apr 2021 10:27:29 +02:00
>
> x86/dma: Tear down DMA ops on driver unbind
>
> Since
>
> 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
>
> a user can switch a device between IOMMU and direct DMA through sysfs.
> This doesn't work for AMD IOMMU at the moment because dev->dma_ops is
> not cleared when switching from a DMA to an identity IOMMU domain. The
> DMA layer thus attempts to use the dma-iommu ops on an identity domain,
> causing an oops:
>
> # echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
> # echo identity > /sys/bus/pci/devices/0000:00:05.0/iommu_group/type
> # echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
> ...
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> ...
> Call Trace:
> iommu_dma_alloc
> e1000e_setup_tx_resources
> e1000e_open
>
> Implement arch_teardown_dma_ops() on x86 to clear the device's dma_ops
> pointer during driver unbind.
>
> [ bp: Massage commit message. ]
>
> Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Signed-off-by: Borislav Petkov <bp@xxxxxxx>
> Link: https://lkml.kernel.org/r/20210414082633.877461-1-jean-philippe@xxxxxxxxxx
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/pci-dma.c | 7 +++++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2792879..2c90f8d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -85,6 +85,7 @@ config X86
> select ARCH_HAS_STRICT_MODULE_RWX
> select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> select ARCH_HAS_SYSCALL_WRAPPER
> + select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_DMA
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> select ARCH_HAS_DEBUG_WX
> select ARCH_HAVE_NMI_SAFE_CMPXCHG
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index de234e7..60a4ec2 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -154,3 +154,10 @@ static void via_no_dac(struct pci_dev *dev)
> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
> PCI_CLASS_BRIDGE_PCI, 8, via_no_dac);
> #endif
> +
> +#ifdef CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS
> +void arch_teardown_dma_ops(struct device *dev)
> +{
> + set_dma_ops(dev, NULL);
> +}
> +#endif

Nope, sorry, no joy. Zapping it from tip.

With that patch, it fails booting on my test box with messages like
(typing up from video I took):

...
ata: softreset failed (1st FIS failed)
ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]
ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]
<--- EOF

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette