Re: [PATCH] of/device: add blacklist for iommu dma_ops
From: Thierry Reding
Date: Mon Jun 03 2019 - 10:44:11 EST
On Mon, Jun 03, 2019 at 07:20:14AM -0700, Rob Clark wrote:
> On Mon, Jun 3, 2019 at 6:54 AM Thierry Reding <thierry.reding@xxxxxxxxx> wrote:
> > On Mon, Jun 03, 2019 at 06:20:57AM -0700, Rob Clark wrote:
> > > On Mon, Jun 3, 2019 at 4:14 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
> > > >
> > > > On 03/06/2019 11:47, Rob Clark wrote:
> > > > > On Sun, Jun 2, 2019 at 11:25 PM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
> > > > >>
> > > > >> On Mon, Jun 3, 2019 at 4:40 AM Rob Clark <robdclark@xxxxxxxxx> wrote:
> > > > >>>
> > > > >>> So, another case I've come across, on the display side.. I'm working
> > > > >>> on handling the case where bootloader enables display (and takes iommu
> > > > >>> out of reset).. as soon as DMA domain gets attached we get iommu
> > > > >>> faults, because bootloader has already configured display for scanout.
> > > > >>> Unfortunately this all happens before actual driver is probed and has
> > > > >>> a chance to intervene.
> > > > >>>
> > > > >>> It's rather unfortunate that we tried to be clever rather than just
> > > > >>> making drivers call some function to opt-in to the hookup of dma iommu
> > > > >>> ops :-(
> > > > >>
> > > > >> I think it still works for the 90% of cases and if 10% needs some
> > > > >> explicit work in the drivers, that's better than requiring 100% of the
> > > > >> drivers to do things manually.
> > > >
> > > > Right, it's not about "being clever", it's about not adding opt-in code
> > > > to the hundreds and hundreds and hundreds of drivers which *might* ever
> > > > find themselves used on a system where they would need the IOMMU's help
> > > > for DMA.
> > >
> > > Well, I mean, at one point we didn't do the automatic iommu hookup, we
> > > could have just stuck with that and added a helper so drivers could
> > > request the hookup. Things wouldn't have been any more broken than
> > > before, and when people bring up systems where iommu is required, they
> > > could have added the necessary dma_iommu_configure() call. But that
> > > is water under the bridge now.
> > >
> > > > >> Adding Marek who had the same problem on Exynos.
> > > > >
> > > > > I do wonder how many drivers need to iommu_map in their ->probe()?
> > > > > I'm thinking moving the auto-hookup to after a successful probe(),
> > > > > with some function a driver could call if they need mapping in probe,
> > > > > might be a way to eventually get rid of the blacklist. But I've no
> > > > > idea how to find the subset of drivers that would be broken without a
> > > > > dma_setup_iommu_stuff() call in their probe.
> > > >
> > > > Wouldn't help much. That particular issue is nothing to do with DMA ops
> > > > really, it's about IOMMU initialisation. On something like SMMUv3 there
> > > > is literally no way to turn the thing on without blocking unknown
> > > > traffic - it *has* to have stream table entries programmed before it can
> > > > even allow stuff to bypass.
> > >
> > > fwiw, on these sdm850 laptops (and I think sdm845 boards like mtp and
> > > db845c) the SMMU (v2) is taken out of bypass by the bootloader. Bjorn
> > > has some patches for arm-smmu to read-back the state at boot.
> > >
> > > (Although older devices were booting with display enabled but SMMU in bypass.)
> > >
> > > > The answer is either to either pay attention to the "Quiesce all DMA
> > > > capable devices" part of the boot protocol (which has been there since
> > > > pretty much forever), or to come up with some robust way of
> > > > communicating "live" boot-time mappings to IOMMU drivers so that they
> > > > can program themselves appropriately during probe.
> > >
> > > Unfortunately display lit up by bootloader is basically ubiquitous.
> > > Every single android phone does it. All of the windows-arm laptops do
> > > it. Basically 99.9% of things that have a display do it. It's a
> > > tough problem to solve involving clks, gdsc, regulators, etc, in
> > > addition to the display driver.. and sadly now smmu. And devices
> > > where we can make changes to and update the firmware are rather rare.
> > > So there is really no option to ignore this problem.
> > I think this is going to require at least some degree of cooperation
> > from the bootloader. See my other thread on that. Unfortunately I think
> > this is an area where everyone has kind of been doing their own thing
> > even if standard bindings for this have been around for quite a while
> > (actually 5 years by now). I suspect that most bootloaders that run
> > today are not that old, but as always downstream doesn't follow closely
> > where upstream is guiding.
> > > I guess if we had some early-quirks mechanism like x86 does, we could
> > > mash the display off early in boot. That would be an easy solution.
> > > Although I'd prefer a proper solution so that android phones aren't
> > > carrying around enormous stacks of hack patches to achieve a smooth
> > > flicker-free boot.
> > The proper solution, I think, is for bootloader and kernel to work
> > together. Unfortunately that means we'll just have to bite the bullet
> > and get things fixed across the stack. I think this is just the latest
> > manifestation of the catch-up that upstream has been playing. Only now
> > that we're starting to enable all of these features upstream are we
> > running into interoperability issues.
> > If upstream had been further along we would've caught these issues way
> > ahead of time and could've influenced the designs of bootloader much
> > earlier. Now, unless we get all the vendors to go back and modify 5 year
> > old code that's going to be difficult.
> > However, I think Robin has a point here: it's clearly documented in the
> > boot protocol, so technically bootloaders are buggy and we can't always
> > go and fix things so that buggy bootloaders continue to work. There's
> > not a whole lot of incentive for anyone to fix the bootloaders if we
> > keep doing that, ey?
> A couple notes:
> 1) The odds of getting new bootloaders for 5yr old phones is basically
> none.. and they are typically signed so we couldn't just write our
> own even if we wanted to.
> 2) The windows arm laptops shipping actually have "real" UEFI+ACPI..
> for now we've been using device-tree to get linux booting on them.
> But I think we are going to need to shift to ACPI eventually.. so
> a dt specific solution isn't super helpful.
> But we do have EFI GOP to get the address of the boot framebuffer,
> and I believe there is a reserved memory region setup for it.
> Not sure how to connect that to the iommu subsys.
It shouldn't be a problem to hook something else up to the IOMMU
subsystem. Hopefully it's something that people are going to standardize
> 3) The automatic attach of DMA domain is also causing a different
> problem for us on the GPU side, preventing us from supporting per-
> context pagetables (since we end up with a disagreement about
> which context bank is used between arm-smmu and the firmware).
I'm not sure I understand this issue. Is the context bank hard-coded in
the firmware somehow? Or is it possible to rewrite which one is going to
be used at runtime? Do you switch out the actual page tables rather than
the IOMMU domains for context switching?
> I'm kinda glad that x86 folks were more pragmatic about getting linux
> to work on actual hardware, not just restricting things to hw that
> looked the way they wanted it too.. at some point in arch/arm64 we are
> going to have to decide that reality is a thing. Ignoring that is
> only going to force users and distros to downstream kernels.
You're comparing apples to oranges here. On x86 at least there was some
standardization when Linux started, whereas we still don't really have
that on ARM after so many years of efforts to standardize. I think we
are slowly getting there, but this particular instance shows that we're
not there yet.
Don't get me wrong, I'm not trying to say that we should just ignore
everything that's out there just because it may not be the way we want
it to be. On the other hand if we just take everything as-is and try to
implement workarounds and quirks every step of the way that's going to
also take away a lot of the resources that are already pretty scarce as
it is. I think it needs to be a reasonable compromise.
Also, there doesn't really seem to be a standardizing force in the Linux
on ARM world, so who's going to do that if not the Linux community?
Description: PGP signature