Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

From: Dmitry Osipenko
Date: Thu Apr 08 2021 - 10:20:31 EST


08.04.2021 16:26, Thierry Reding пишет:
> On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote:
>> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
>>> All consumer-grade Android and Chromebook devices show a splash screen
>>> on boot and then display is left enabled when kernel is booted. This
>>> behaviour is unacceptable in a case of implicit IOMMU domains to which
>>> devices are attached during kernel boot since devices, like display
>>> controller, may perform DMA at that time. We can work around this problem
>>> by deferring the enable of SMMU translation for a specific devices,
>>> like a display controller, until the first IOMMU mapping is created,
>>> which works good enough in practice because by that time h/w is already
>>> stopped.
>>>
>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
>>
>> For both patches:
>> Acked-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx>
>> Tested-by: Nicolin Chen <nicoleotsuka@xxxxxxxxx>
>>
>> The WAR looks good to me. Perhaps Thierry would give some input.

Nicolin, thank you very much for the help!

>> Another topic:
>> I think this may help work around the mc-errors, which we have
>> been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA.
>> (attached a test patch rebasing on these two)
>
> Ugh... that's exactly what I was afraid of. Now everybody is going to
> think that we can just work around this issue with driver-specific SMMU
> hacks...
>
>> However, GPU would also report errors using DMA domain:
>>
>> nouveau 57000000.gpu: acr: firmware unavailable
>> nouveau 57000000.gpu: pmu: firmware unavailable
>> nouveau 57000000.gpu: gr: firmware unavailable
>> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffbe200: Security violation (TrustZone violation)
>> nouveau 57000000.gpu: DRM: failed to create kernel channel, -22
>> tegra-mc 70019000.memory-controller: gpusrd: read @0x00000000fffad000: Security violation (TrustZone violation)
>> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 []
>> nouveau 57000000.gpu: fifo: SCHED_ERROR 20 []
>>
>> Looking at the address, seems that GPU allocated memory in 32-bit
>> physical address space behind SMMU, so a violation happened after
>> turning on DMA domain I guess...
>
> The problem with GPU is... extra complicated. You're getting these
> faults because you're enabling the IOMMU-backed DMA API, which then
> causes the Nouveau driver allocate buffers using the DMA API instead of
> explicitly allocating pages and then mapping them using the IOMMU API.
> However, there are additional patches needed to teach Nouveau about how
> to deal with SMMU and those haven't been merged yet. I've got prototypes
> of this, but before the whole framebuffer carveout passing work makes
> progress there's little sense in moving individual pieces forward.
>
> One more not to try and cut corners. We know what the right solution is,
> even if it takes a lot of work. I'm willing to ack this patch, or some
> version of it, but only as a way of working around things we have no
> realistic chance of fixing properly anymore. I still think it would be
> best if we could derive identity mappings from command-line arguments on
> these platforms because I think most of them will actually set that, and
> then the solution becomes at least uniform at the SMMU level.
>
> For Tegra210 I've already laid out a path to a solution that's going to
> be generic and extend to Tegra186 and later as well.

We still have issues in the DRM and other drivers that don't allow us to
flip ON the IOMMU_DOMAIN_DMA support.

My patch addresses the issue with the ARM_DMA_USE_IOMMU option, which
allocates the unmanaged domain for DMA purposes on ARM32, causing the
trouble in the multiplatform kernel configuration since it's not
possible to opt-out from ARM_DMA_USE_IOMMU in this case. Perhaps this
needs to be clarified in the commit message.

https://elixir.bootlin.com/linux/v5.12-rc6/source/arch/arm/mm/dma-mapping.c#L2078

https://elixir.bootlin.com/linux/v5.12-rc6/source/drivers/iommu/iommu.c#L1929