Re: [PATCH] of/device: add blacklist for iommu dma_ops

From: Thierry Reding
Date: Mon Jun 03 2019 - 09:43:14 EST


On Mon, Jun 03, 2019 at 12:14:27PM +0100, Robin Murphy wrote:
> On 03/06/2019 11:47, Rob Clark wrote:
> > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Jun 3, 2019 at 4:40 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, May 10, 2019 at 7:35 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Dec 4, 2018 at 2:29 PM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Sat, Dec 1, 2018 at 10:54 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > This solves a problem we see with drm/msm, caused by getting
> > > > > > > iommu_dma_ops while we attach our own domain and manage it directly at
> > > > > > > the iommu API level:
> > > > > > >
> > > > > > > [0000000000000038] user address but active_mm is swapper
> > > > > > > Internal error: Oops: 96000005 [#1] PREEMPT SMP
> > > > > > > Modules linked in:
> > > > > > > CPU: 7 PID: 70 Comm: kworker/7:1 Tainted: G W 4.19.3 #90
> > > > > > > Hardware name: xxx (DT)
> > > > > > > Workqueue: events deferred_probe_work_func
> > > > > > > pstate: 80c00009 (Nzcv daif +PAN +UAO)
> > > > > > > pc : iommu_dma_map_sg+0x7c/0x2c8
> > > > > > > lr : iommu_dma_map_sg+0x40/0x2c8
> > > > > > > sp : ffffff80095eb4f0
> > > > > > > x29: ffffff80095eb4f0 x28: 0000000000000000
> > > > > > > x27: ffffffc0f9431578 x26: 0000000000000000
> > > > > > > x25: 00000000ffffffff x24: 0000000000000003
> > > > > > > x23: 0000000000000001 x22: ffffffc0fa9ac010
> > > > > > > x21: 0000000000000000 x20: ffffffc0fab40980
> > > > > > > x19: ffffffc0fab40980 x18: 0000000000000003
> > > > > > > x17: 00000000000001c4 x16: 0000000000000007
> > > > > > > x15: 000000000000000e x14: ffffffffffffffff
> > > > > > > x13: ffff000000000000 x12: 0000000000000028
> > > > > > > x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > > > > > > x9 : 0000000000000000 x8 : ffffffc0fab409a0
> > > > > > > x7 : 0000000000000000 x6 : 0000000000000002
> > > > > > > x5 : 0000000100000000 x4 : 0000000000000000
> > > > > > > x3 : 0000000000000001 x2 : 0000000000000002
> > > > > > > x1 : ffffffc0f9431578 x0 : 0000000000000000
> > > > > > > Process kworker/7:1 (pid: 70, stack limit = 0x0000000017d08ffb)
> > > > > > > Call trace:
> > > > > > > iommu_dma_map_sg+0x7c/0x2c8
> > > > > > > __iommu_map_sg_attrs+0x70/0x84
> > > > > > > get_pages+0x170/0x1e8
> > > > > > > msm_gem_get_iova+0x8c/0x128
> > > > > > > _msm_gem_kernel_new+0x6c/0xc8
> > > > > > > msm_gem_kernel_new+0x4c/0x58
> > > > > > > dsi_tx_buf_alloc_6g+0x4c/0x8c
> > > > > > > msm_dsi_host_modeset_init+0xc8/0x108
> > > > > > > msm_dsi_modeset_init+0x54/0x18c
> > > > > > > _dpu_kms_drm_obj_init+0x430/0x474
> > > > > > > dpu_kms_hw_init+0x5f8/0x6b4
> > > > > > > msm_drm_bind+0x360/0x6c8
> > > > > > > try_to_bring_up_master.part.7+0x28/0x70
> > > > > > > component_master_add_with_match+0xe8/0x124
> > > > > > > msm_pdev_probe+0x294/0x2b4
> > > > > > > platform_drv_probe+0x58/0xa4
> > > > > > > really_probe+0x150/0x294
> > > > > > > driver_probe_device+0xac/0xe8
> > > > > > > __device_attach_driver+0xa4/0xb4
> > > > > > > bus_for_each_drv+0x98/0xc8
> > > > > > > __device_attach+0xac/0x12c
> > > > > > > device_initial_probe+0x24/0x30
> > > > > > > bus_probe_device+0x38/0x98
> > > > > > > deferred_probe_work_func+0x78/0xa4
> > > > > > > process_one_work+0x24c/0x3dc
> > > > > > > worker_thread+0x280/0x360
> > > > > > > kthread+0x134/0x13c
> > > > > > > ret_from_fork+0x10/0x18
> > > > > > > Code: d2800004 91000725 6b17039f 5400048a (f9401f40)
> > > > > > > ---[ end trace f22dda57f3648e2c ]---
> > > > > > > Kernel panic - not syncing: Fatal exception
> > > > > > > SMP: stopping secondary CPUs
> > > > > > > Kernel Offset: disabled
> > > > > > > CPU features: 0x0,22802a18
> > > > > > > Memory Limit: none
> > > > > > >
> > > > > > > The problem is that when drm/msm does it's own iommu_attach_device(),
> > > > > > > now the domain returned by iommu_get_domain_for_dev() is drm/msm's
> > > > > > > domain, and it doesn't have domain->iova_cookie.
> > > > > > >
> > > > > > > We kind of avoided this problem prior to sdm845/dpu because the iommu
> > > > > > > was attached to the mdp node in dt, which is a child of the toplevel
> > > > > > > mdss node (which corresponds to the dev passed in dma_map_sg()). But
> > > > > > > with sdm845, now the iommu is attached at the mdss level so we hit the
> > > > > > > iommu_dma_ops in dma_map_sg().
> > > > > > >
> > > > > > > But auto allocating/attaching a domain before the driver is probed was
> > > > > > > already a blocking problem for enabling per-context pagetables for the
> > > > > > > GPU. This problem is also now solved with this patch.
> > > > > > >
> > > > > > > Fixes: 97890ba9289c dma-mapping: detect and configure IOMMU in of_dma_configure
> > > > > > > Tested-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxx>
> > > > > > > ---
> > > > > > > This is an alternative/replacement for [1]. What it lacks in elegance
> > > > > > > it makes up for in practicality ;-)
> > > > > > >
> > > > > > > [1] https://patchwork.freedesktop.org/patch/264930/
> > > > > > >
> > > > > > > drivers/of/device.c | 22 ++++++++++++++++++++++
> > > > > > > 1 file changed, 22 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > > > > > > index 5957cd4fa262..15ffee00fb22 100644
> > > > > > > --- a/drivers/of/device.c
> > > > > > > +++ b/drivers/of/device.c
> > > > > > > @@ -72,6 +72,14 @@ int of_device_add(struct platform_device *ofdev)
> > > > > > > return device_add(&ofdev->dev);
> > > > > > > }
> > > > > > >
> > > > > > > +static const struct of_device_id iommu_blacklist[] = {
> > > > > > > + { .compatible = "qcom,mdp4" },
> > > > > > > + { .compatible = "qcom,mdss" },
> > > > > > > + { .compatible = "qcom,sdm845-mdss" },
> > > > > > > + { .compatible = "qcom,adreno" },
> > > > > > > + {}
> > > > > > > +};
> > > > > >
> > > > > > Not completely clear to whether this is still needed or not, but this
> > > > > > really won't scale. Why can't the driver for these devices override
> > > > > > whatever has been setup by default?
> > > > > >
> > > > >
> > > > > fwiw, at the moment it is not needed, but it will become needed again
> > > > > to implement per-context pagetables (although I suppose for this we
> > > > > only need to blacklist qcom,adreno and not also the display nodes).
> > > >
> > > > So, another case I've come across, on the display side.. I'm working
> > > > on handling the case where bootloader enables display (and takes iommu
> > > > out of reset).. as soon as DMA domain gets attached we get iommu
> > > > faults, because bootloader has already configured display for scanout.
> > > > Unfortunately this all happens before actual driver is probed and has
> > > > a chance to intervene.
> > > >
> > > > It's rather unfortunate that we tried to be clever rather than just
> > > > making drivers call some function to opt-in to the hookup of dma iommu
> > > > ops :-(
> > >
> > > I think it still works for the 90% of cases and if 10% needs some
> > > explicit work in the drivers, that's better than requiring 100% of the
> > > drivers to do things manually.
>
> Right, it's not about "being clever", it's about not adding opt-in code to
> the hundreds and hundreds and hundreds of drivers which *might* ever find
> themselves used on a system where they would need the IOMMU's help for DMA.
>
> > > Adding Marek who had the same problem on Exynos.
> >
> > I do wonder how many drivers need to iommu_map in their ->probe()?
> > I'm thinking moving the auto-hookup to after a successful probe(),
> > with some function a driver could call if they need mapping in probe,
> > might be a way to eventually get rid of the blacklist. But I've no
> > idea how to find the subset of drivers that would be broken without a
> > dma_setup_iommu_stuff() call in their probe.
>
> Wouldn't help much. That particular issue is nothing to do with DMA ops
> really, it's about IOMMU initialisation. On something like SMMUv3 there is
> literally no way to turn the thing on without blocking unknown traffic - it
> *has* to have stream table entries programmed before it can even allow stuff
> to bypass.
>
> The answer is either to either pay attention to the "Quiesce all DMA capable
> devices" part of the boot protocol (which has been there since pretty much
> forever), or to come up with some robust way of communicating "live"
> boot-time mappings to IOMMU drivers so that they can program themselves
> appropriately during probe.

I ran into a similar issue not too long ago and I've been working on
what I think is a correct fix for this problem. Unfortunately I went
down the rabbit hole of trying to get all of the pieces in the
bootloader merged before posting the kernel patches, and that's been
taking a long time. Let me chime in here in the hope that it will be
helpful to others.

The problem that I ran into was pretty much the same thing that Rob
encountered. We have some platforms that will initialize display over
HDMI in an early bootloader to show a splash screen. During boot we
would usually take over display hardware by just resetting it and then
programming it from scratch. Ultimately we want to do seamless handover
so that the kernel can take over the splash and replace it by the
console when that's ready. But I'm getting ahead of myself.

One of the things I've been trying to do is replace direct IOMMU API
usage in the Tegra DRM driver by using DMA API only, which we need in
order to fix some corner cases we ran into (I've now forgotten most of
the details and I realize that my commit messages aren't all that
helpful...).

Anyway, when I started using the DMA API I was running into a massive
amount of IOMMU faults starting at early boot. It took me a while to
realize that this was because now the IOMMU was enabled before the
driver had a chance to set up the IOMMU domain. I think the only way
that we were getting around that was because we used to have a custom
IOMMU driver on older Tegra device and don't expose DMA API compatible
IOMMU domains. With newer Tegras using the ARM SMMU we don't really have
that option anymore.

So the root cause of this is that the bootloader initialized the display
controller to scan out from a physical address (the bootloader does not
set up the SMMU) and when the kernel attaches the display controller to
the SMMU during early boot, the display controller ends up trying to
fetch those physical addresses through the SMMU where no mapping for
those addresses exists, hence causing these faults.

The solution that I came up with is to use the reserved-memory bindings
along with memory-region references in the display controller nodes. I
have a patch for the Tegra SMMU driver that implements support for
parsing this data (the IOMMU framework has infrastructure to do this, so
I take it that this has come up before) and setting up 1:1 mappings
right before the SMMU is enabled for a device. This works rather well in
my testing. I've been using hard-coded values because the bootloader
does not properly put the reserved memory regions into the DT. I've been
trying to add that as a side-project, but it turned into a bit of a can
of worms.

These are all standard bindings, so I think we could implement something
similar in the ARM SMMU driver. In fact, I was going to do that once I
had sorted this all out for Tegra SMMU, but I'd be happy if anyone can
beat me to it.

I've attached the patch for Tegra SMMU here. I think pretty much the
same thing could be implement for any other drivers since this is based
on standard bindings. Perhaps there could even be helpers. Actually it
looks like one such helpers already exists:

iommu_dma_get_resv_regions()

I think the generic code looking up the standard memory-region property
could be added to that to expose this functionality more broadly.

One thing to note is that I have this workaround in the Tegra SMMU
driver to bind the IOMMU domain to a specific instance during this early
direct mappings business. But I think that may no longer be necessary
since there seems to have been some work in this area lately.
Specifically I'm looking at 7423e01741dd6a5f1255f589145313f0fb1c8cbe,
which may help. I can investigate, but let me post the patch first to
keep the discussion going.

Here's a short extract of how I'm using this in device tree:

--- >8 ---
diff --git a/arch/arm64/boot/dts/nvidia/tegra210-p2371-2180.dts b/arch/arm64/boot/dts/nvidia/tegra210-p2371-2180.dts
index 5a57396b5948..82c4e0c88740 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210-p2371-2180.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra210-p2371-2180.dts
@@ -8,6 +8,16 @@
model = "NVIDIA Jetson TX1 Developer Kit";
compatible = "nvidia,p2371-2180", "nvidia,tegra210";

+ reserved-memory {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ ranges;
+
+ framebuffer: framebuffer@92c9f000 {
+ reg = <0 0x92c9f000 0 0x00800000>;
+ };
+ };
+
pcie@1003000 {
status = "okay";

@@ -35,6 +45,14 @@
};

host1x@50000000 {
+ dc@54200000 {
+ memory-region = <&framebuffer>;
+ };
+
+ dc@54240000 {
+ memory-region = <&framebuffer>;
+ };
+
dsi@54300000 {
status = "okay";

--- >8 ---

Note that these entries should all be generated at runtime by the
bootloader once it has allocated the framebuffer. That's the can of
worms I've been talking about.

Attached is the SMMU driver patch.

Hope that helps,
Thierry
From 1c7c4fcc5364e48fc5390ac17dd7e4ca347a6eb4 Mon Sep 17 00:00:00 2001
From: Thierry Reding <treding@xxxxxxxxxx>
Date: Fri, 15 Feb 2019 15:16:44 +0100
Subject: [PATCH] iommu/tegra-smmu: Add support for reserved regions

The Tegra DRM driver currently uses the IOMMU API explicitly. This means
that it has fine-grained control over when exactly the translation
through the IOMMU is enabled. This currently happens after the driver
probes, so the driver is in a DMA quiesced state when the IOMMU
translation is enabled.

During the transition of the Tegra DRM driver to use the DMA API instead
of the IOMMU API explicitly, it was observed that on certain platforms
the display controllers were still actively fetching from memory. When a
DMA IOMMU domain is created as part of the DMA/IOMMU API setup during
boot, the IOMMU translation for the display controllers can be enabled a
significant amount of time before the driver has had a chance to reset
the hardware into a sane state. This causes the SMMU to detect faults on
the addresses that the display controller is trying to fetch.

To avoid this, and as a byproduct paving the way for seamless transition
of display from the bootloader to the kernel, add support for reserved
regions in the Tegra SMMU driver. This is implemented using the standard
reserved memory device tree bindings, which let us describe regions of
memory which the kernel is forbidden from using for regular allocations.
The Tegra SMMU driver will parse the nodes associated with each device
via the "memory-region" property and return reserved regions that the
IOMMU core will then create direct mappings for prior to attaching the
IOMMU domains to the devices. This ensures that a 1:1 mapping is in
place when IOMMU translation starts and prevents the SMMU from detecting
any faults.

Signed-off-by: Thierry Reding <treding@xxxxxxxxxx>
---
drivers/iommu/tegra-smmu.c | 115 +++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 8e49869812b0..b96444be325d 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -12,6 +12,7 @@
#include <linux/iommu.h>
#include <linux/kernel.h>
#include <linux/of.h>
+#include <linux/of_address.h>
#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
@@ -545,6 +546,38 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
struct tegra_smmu *smmu = as->smmu;
u32 *pd = page_address(as->pd);
unsigned long offset = pd_index * sizeof(*pd);
+ bool unmap = false;
+
+ /*
+ * XXX Move this outside of this function. Perhaps add a struct
+ * iommu_domain parameter to ->{get,put}_resv_regions() so that
+ * the mapping can be done there.
+ *
+ * The problem here is that as->smmu is only known once we attach
+ * the domain to a device (because then we look up the right SMMU
+ * instance via the dev->archdata.iommu pointer). When the direct
+ * mappings are created for reserved regions, the domain has not
+ * been attached to a device yet, so we don't know. We currently
+ * fix that up in ->apply_resv_regions() because that is the first
+ * time where we have access to a struct device that will be used
+ * with the IOMMU domain. However, that's asymmetric and doesn't
+ * take care of the page directory mapping either, so we need to
+ * come up with something better.
+ */
+ if (as->pd_dma == 0) {
+ as->pd_dma = dma_map_page(smmu->dev, as->pd, 0, SMMU_SIZE_PD,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(smmu->dev, as->pd_dma))
+ return;
+
+ if (!smmu_dma_addr_valid(smmu, as->pd_dma)) {
+ dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD,
+ DMA_TO_DEVICE);
+ return;
+ }
+
+ unmap = true;
+ }

/* Set the page directory entry first */
pd[pd_index] = value;
@@ -557,6 +590,12 @@ static void tegra_smmu_set_pde(struct tegra_smmu_as *as, unsigned long iova,
smmu_flush_ptc(smmu, as->pd_dma, offset);
smmu_flush_tlb_section(smmu, as->id, iova);
smmu_flush(smmu);
+
+ if (unmap) {
+ dma_unmap_page(smmu->dev, as->pd_dma, SMMU_SIZE_PD,
+ DMA_TO_DEVICE);
+ as->pd_dma = 0;
+ }
}

static u32 *tegra_smmu_pte_offset(struct page *pt_page, unsigned long iova)
@@ -897,6 +936,79 @@ static struct iommu_group *tegra_smmu_device_group(struct device *dev)
return group;
}

+static void tegra_smmu_get_resv_regions(struct device *dev, struct list_head *list)
+{
+ struct of_phandle_iterator it;
+ int err;
+
+ if (!dev->of_node)
+ return;
+
+ of_for_each_phandle(&it, err, dev->of_node, "memory-region", NULL, 0) {
+ struct iommu_resv_region *region;
+ struct resource res;
+
+ err = of_address_to_resource(it.node, 0, &res);
+ if (err < 0) {
+ dev_err(dev, "failed to parse memory region %pOFn: %d\n",
+ it.node, err);
+ continue;
+ }
+
+ region = iommu_alloc_resv_region(res.start, resource_size(&res),
+ IOMMU_READ | IOMMU_WRITE,
+ IOMMU_RESV_DIRECT);
+ if (!region)
+ return;
+
+ dev_dbg(dev, "reserved region %pR\n", &res);
+ list_add_tail(&region->list, list);
+ }
+}
+
+static void tegra_smmu_put_resv_regions(struct device *dev,
+ struct list_head *list)
+{
+ struct iommu_resv_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, list, list)
+ kfree(entry);
+}
+
+static void tegra_smmu_apply_resv_region(struct device *dev,
+ struct iommu_domain *domain,
+ struct iommu_resv_region *region)
+{
+ struct tegra_smmu *smmu = dev->archdata.iommu;
+ struct tegra_smmu_as *as = to_smmu_as(domain);
+
+ /*
+ * ->attach_dev() may not have been called yet at this point, so the
+ * address space may not have been associated with an SMMU instance.
+ * Set up the association here to make sure subsequent code can rely
+ * on the SMMU instance being known.
+ *
+ * Also make sure that the SMMU instance doesn't conflict if an SMMU
+ * has been associated with the address space already. This can happen
+ * if a domain is shared between multiple devices.
+ *
+ * Note that this is purely theoretic because there are no known SoCs
+ * with multiple instances of this SMMU.
+ *
+ * XXX Deal with this elsewhere. One possibility would be to pass the
+ * struct iommu_domain that we're operating on to ->get_resv_regions()
+ * and ->put_resv_regions() so that the connection between it and the
+ * struct device (in order to find the SMMU instance) can already be
+ * established at that time. This would be nicely symmetric because a
+ * ->put_resv_regions() could undo that again so that ->attach_dev()
+ * could start from a clean slate.
+ */
+ if (as->smmu && as->smmu != smmu)
+ WARN(1, "conflicting SMMU instances\n");
+
+ as->smmu = smmu;
+}
+
static int tegra_smmu_of_xlate(struct device *dev,
struct of_phandle_args *args)
{
@@ -917,6 +1029,9 @@ static const struct iommu_ops tegra_smmu_ops = {
.map = tegra_smmu_map,
.unmap = tegra_smmu_unmap,
.iova_to_phys = tegra_smmu_iova_to_phys,
+ .get_resv_regions = tegra_smmu_get_resv_regions,
+ .put_resv_regions = tegra_smmu_put_resv_regions,
+ .apply_resv_region = tegra_smmu_apply_resv_region,
.of_xlate = tegra_smmu_of_xlate,
.pgsize_bitmap = SZ_4K,
};
--
2.21.0

Attachment: signature.asc
Description: PGP signature