Re: [PATCH v4 04/14] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches
From: Dan Williams
Date: Tue May 08 2018 - 10:31:38 EST
On Mon, Apr 23, 2018 at 4:30 PM, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
> For peer-to-peer transactions to work the downstream ports in each
> switch must not have the ACS flags set. At this time there is no way
> to dynamically change the flags and update the corresponding IOMMU
> groups so this is done at enumeration time before the groups are
> assigned.
>
> This effectively means that if CONFIG_PCI_P2PDMA is selected then
> all devices behind any PCIe switch heirarchy will be in the same IOMMU
> group. Which implies that individual devices behind any switch
> heirarchy will not be able to be assigned to separate VMs because
> there is no isolation between them. Additionally, any malicious PCIe
> devices will be able to DMA to memory exposed by other EPs in the same
> domain as TLPs will not be checked by the IOMMU.
>
> Given that the intended use case of P2P Memory is for users with
> custom hardware designed for purpose, we do not expect distributors
> to ever need to enable this option. Users that want to use P2P
> must have compiled a custom kernel with this configuration option
> and understand the implications regarding ACS. They will either
> not require ACS or will have design the system in such a way that
> devices that require isolation will be separate from those using P2P
> transactions.
>
> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx>
> ---
> drivers/pci/Kconfig | 9 +++++++++
> drivers/pci/p2pdma.c | 45 ++++++++++++++++++++++++++++++---------------
> drivers/pci/pci.c | 6 ++++++
> include/linux/pci-p2pdma.h | 5 +++++
> 4 files changed, 50 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index b2396c22b53e..b6db41d4b708 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -139,6 +139,15 @@ config PCI_P2PDMA
> transations must be between devices behind the same root port.
> (Typically behind a network of PCIe switches).
>
> + Enabling this option will also disable ACS on all ports behind
> + any PCIe switch. This effectively puts all devices behind any
> + switch heirarchy into the same IOMMU group. Which implies that
> + individual devices behind any switch will not be able to be
> + assigned to separate VMs because there is no isolation between
> + them. Additionally, any malicious PCIe devices will be able to
> + DMA to memory exposed by other EPs in the same domain as TLPs
> + will not be checked by the IOMMU.
> +
> If unsure, say N.
It seems unwieldy that this is a compile time option and not a runtime
option. Can't we have a kernel command line option to opt-in to this
behavior rather than require a wholly separate kernel image?
Why is this text added in a follow on patch and not the patch that
introduced the config option?
I'm also wondering if that command line option can take a 'bus device
function' address of a switch to limit the scope of where ACS is
disabled.