Re: [PATCH 3/3] iommu: armsmmu: set iommu ops for rpmsg bus
From: Robin Murphy
Date: Fri May 11 2018 - 14:24:20 EST
On 07/05/18 20:28, Bjorn Andersson wrote:
On Fri, Mar 2, 2018 at 8:59 AM, Robin Murphy <robin.murphy@xxxxxxx> wrote:
On 02/03/18 14:55, srinivas.kandagatla@xxxxxxxxxx wrote:
From: Srinivas Kandagatla <srinivas.kandagatla@xxxxxxxxxx>
On Qualcomm SoCs, ADSP exposes many functions like audio and
others. These services need iommu access to allocate any
memory for the DSP. As these drivers are childeren of
rpmsg bus, able to allocate memory from iommus is basic
requirement. So set arm smmu iommu ops for this bus type.
Forgot to answer this and the dma_ops patch seems to be going in the
right direction.
Documentation/rpmsg.txt: "Every rpmsg device is a communication channel with
a remote processor (thus rpmsg devices are called channels)."
I'd instinctively assume that a remote processor already has its own memory,
and that a communication channel doesn't somehow go directly through an
IOMMU, so that "basic requirement" seems like a pretty big assumption.
As of today rpmsg exclusively uses system memory for implementing the
communication fifos, but this memory is owned/handled by the rpmsg
bus. The need here is for drivers on top of the rpmsg_bus,
implementing some application-level protocol that requires indirection
buffers; e.g. to achieve zero copy of audio or image buffers that the
remote processor is expected to operate on. In this case the device
sitting on top of the rpmsg bus will have to map the buffer to the
appropriate context and can then send application specific control
requests referencing this mapping.
Right, but that's more or less what I was getting at - rpmsg can be used
as a means to signal some DMA master device to start doing a thing, but
that thing itself is unrelated to rpmsg, and it by no means implies that
everything which rpmsg can talk to is always capable of system-wide DMA.
It's no different if that communication channel is a hardware mailbox or
an I2C/SPI/USB/etc. link, rather than virtio; we wouldn't automatically
consider devices on the other end of those to be directly connected to
an IOMMU either.
IOMMU and DMA operations are highly dependent on the physical hardware
topology, which is why I really don't like trying to shoehorn them into
software constructs without modelling the actual hardware reasonably
accurately. For instance it's not unheard of for remote processors in a
SoC to see a different physical memory map from the main application
processors - how would rpmsg try to describe that? What even is the
address space of the rpmsg "bus"?
As different parts of the firmware might operate in different contexts
it's not feasible to utilize the parent's (the rpmsg_bus) context for
these indirection buffers.
Indeed, and I maintain that that wouldn't be the right thing to do
anyway. As before, I think the most accurate way to model the situation
with the tools we have available is to have the actual hardware function
represented by a platform device, which is associated with a
corresponding rpmsg endpoint. Then the driver can manage communication
in the rpmsg context, and physical DMA setup in the 'real' hardware
context, and everything looks sane without questionable abstraction
breakage. Since this looked to be more or less what is actually
implemented anyway, it doesn't seem all that hard to refine; if there
are multiple DMA master functions identified distinctly to the IOMMU,
then they could either be represented as separate platform devices with
explicit IOMMU specifiers, or you could model the actual DSP subsystem
hardware as its own bus-like arrangement with an iommu-map arrangement
translating function identifiers to IOMMU identifiers.
What I don't like is forcing IOMMU drivers to pretend that some data in
a shared memory buffer is itself directly capable of generating
transactions on the interconnect. If other 'indirect' bus abstractions
like CoreSight can get this right, I don't see why rpmsg deserves to be
special.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@xxxxxxxxxx>
---
drivers/iommu/arm-smmu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e6920d32ac9e..9b63489af15c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -53,6 +53,7 @@
#include <linux/spinlock.h>
#include <linux/amba/bus.h>
+#include <linux/rpmsg.h>
#include "io-pgtable.h"
#include "arm-smmu-regs.h"
@@ -2168,6 +2169,10 @@ static void arm_smmu_bus_init(void)
bus_set_iommu(&pci_bus_type, &arm_smmu_ops);
}
#endif
+#ifdef CONFIG_RPMSG
Ah, so this will at least build OK with RPMSG=m, but I doubt it does what
you want it to in that case.
Things have been refactored but the core has remained tristate,
causing extra head aches in various areas. I think it's very
reasonable to review the rpmsg config options and make CONFIG_RPMSG
bool.
So with the addition of making CONFIG_RPMSG bool the patch has my Acked-by.
That said I'm generally concerned about the first probed iommu
implementation assigning itself as the sole iommu implementation for
all busses, but I guess we haven't yet hit the point where there are
different iommu implementations in a single SoC?
As it happens I do know of one such SoC already - Rockchip RK3288 seems
to have an undocumented Arm SMMU alongside all the rockchip-iommu
instances, but it's not used by Linux (and I have no idea what it was
intended for; I just went and poked the intriguing "peripheral MMU"
region of the memory map and found what looks an awful lot like an Arm
MMU-400). More realistically, I also know of folks using the Arm Juno
dev board with an MMU-600 in the add-on FPGA tile, which would have that
driver-probe-order fight with the MMU-401 instances in the SoC, but I
figure they were either using an older firmware which didn't enable the
latter or just got lucky with not having the SMMUv2 driver enabled.
But yes, the per-bus ops thing is awful and I've been complaining about
it for years now. Since iommu_fwspec we at least have the foundations
for per-device ops in place now, but as is often the case, getting 80%
of the way there is simple[1], whilst the last 20% (like replacing
iommu_domain_alloc(), and where to call iommu_{add,remove}_device()
from) is really hard.
Robin.
[1]
https://www.mail-archive.com/iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx/msg14576.html