Re: [PATCH v1 2/3] drm/tegra: Fix 2d and 3d clients detaching from IOMMU domain

From: Thierry Reding
Date: Thu Oct 24 2019 - 13:28:22 EST


On Thu, Oct 24, 2019 at 07:31:19PM +0300, Dmitry Osipenko wrote:
> 24.10.2019 19:21, Dmitry Osipenko ÐÐÑÐÑ:
> > 24.10.2019 19:09, Dmitry Osipenko ÐÐÑÐÑ:
> >> 24.10.2019 18:57, Dmitry Osipenko ÐÐÑÐÑ:
> >>> 24.10.2019 18:56, Thierry Reding ÐÐÑÐÑ:
> >>>> On Thu, Oct 24, 2019 at 06:47:23PM +0300, Dmitry Osipenko wrote:
> >>>>> 24.10.2019 16:50, Thierry Reding ÐÐÑÐÑ:
> >>>>>> On Thu, Oct 24, 2019 at 04:28:41PM +0300, Dmitry Osipenko wrote:
> >>>>>>> 24.10.2019 14:58, Thierry Reding ÐÐÑÐÑ:
> >>>>>>>> On Sun, Jun 23, 2019 at 08:37:42PM +0300, Dmitry Osipenko wrote:
> >>>>>>>>> This should should fire up on the DRM's driver module re-loader because
> >>>>>>>>> there won't be enough available domains on older Tegra SoCs.
> >>>>>>>>>
> >>>>>>>>> Cc: stable <stable@xxxxxxxxxxxxxxx>
> >>>>>>>>> Fixes: 0c407de5ed1a ("drm/tegra: Refactor IOMMU attach/detach")
> >>>>>>>>> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
> >>>>>>>>> ---
> >>>>>>>>> drivers/gpu/drm/tegra/dc.c | 4 ++--
> >>>>>>>>> drivers/gpu/drm/tegra/drm.c | 9 ++++++---
> >>>>>>>>> drivers/gpu/drm/tegra/drm.h | 3 ++-
> >>>>>>>>> drivers/gpu/drm/tegra/gr2d.c | 4 ++--
> >>>>>>>>> drivers/gpu/drm/tegra/gr3d.c | 4 ++--
> >>>>>>>>> 5 files changed, 14 insertions(+), 10 deletions(-)
> >>>>>>>>
> >>>>>>>> I think I understand what this is trying to do, but the commit message
> >>>>>>>> does not help at all. So what's really going on here is that we need to
> >>>>>>>> detach the device from the group regardless of whether we're sharing the
> >>>>>>>> group or not, just like we attach groups to the shared domain whether
> >>>>>>>> they share the same group or not.
> >>>>>>>
> >>>>>>> Yes, the commit's message could be improved.
> >>>>>>>
> >>>>>>>> But in that case, I wonder if it's even worth splitting groups the way
> >>>>>>>> we are right now. Wouldn't it be better to just put all the devices into
> >>>>>>>> the same group and be done with it?
> >>>>>>>>
> >>>>>>>> The current code gives me headaches every time I read it, so if we can
> >>>>>>>> just make it so that all the devices under the DRM device share the same
> >>>>>>>> group, this would become a lot easier to deal with. I'm not really
> >>>>>>>> convinced that it makes much sense to keep them on separate domains,
> >>>>>>>> especially given the constraints on the number of domains available on
> >>>>>>>> earlier Tegra devices.
> >>>>>>>>
> >>>>>>>> Note that sharing a group will also make it much easier for these to use
> >>>>>>>> the DMA API if it is backed by an IOMMU.
> >>>>>>>
> >>>>>>> Probably I'm blanking on everything about IOMMU now.. could you please
> >>>>>>> remind me what "IOMMU group" is?
> >>>>>>>
> >>>>>>> Isn't it that each IOMMU group relates to the HW ID (SWGROUP)? But then
> >>>>>>> each display controller has its own SWGROUP.. and thus that sharing just
> >>>>>>> doesn't make any sense, hm.
> >>>>>>
> >>>>>> IOMMU groups are not directly related to SWGROUPs. But by default the
> >>>>>> IOMMU framework will share a domain between members of the same IOMMU
> >>>>>> group.
> >>>>>
> >>>>> Ah, I re-figured out that again. The memory controller drivers are
> >>>>> defining a single "IOMMU group" for both of the display controllers.
> >>>>>
> >>>>>> Seems like that's really what we want here, so that when we do
> >>>>>> use the DMA API, all the devices part of the DRM device get attached to
> >>>>>> the same IOMMU domain, yet if we don't want to use the DMA API we only
> >>>>>> need to detach the one group from the backing.
> >>>>>
> >>>>> Yes, it should be okay to put all DRM devices into the same group, like
> >>>>> it is done now for the displays. It also should resolve problem with the
> >>>>> domains shortage on T30 since now there are maximum 3 domains in use:
> >>>>> host1x, drm and vde.
> >>>>>
> >>>>> I actually just checked that the original problem still exists
> >>>>> and this change solves it as well:
> >>>>>
> >>>>> ---
> >>>>> diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c
> >>>>> index 5a0f6e0a1643..e71096498436 100644
> >>>>> --- a/drivers/memory/tegra/tegra30.c
> >>>>> +++ b/drivers/memory/tegra/tegra30.c
> >>>>> @@ -1021,6 +1021,9 @@ static const struct tegra_smmu_swgroup
> >>>>> tegra30_swgroups[] = {
> >>>>> static const unsigned int tegra30_group_display[] = {
> >>>>> TEGRA_SWGROUP_DC,
> >>>>> TEGRA_SWGROUP_DCB,
> >>>>> + TEGRA_SWGROUP_G2,
> >>>>> + TEGRA_SWGROUP_NV,
> >>>>> + TEGRA_SWGROUP_NV2,
> >>>>> };
> >>>>>
> >>>>> static const struct tegra_smmu_group_soc tegra30_groups[] = {
> >>>>> ---
> >>>>>
> >>>>> Please let me know whether you're going to make a patch or if I should
> >>>>> do it.
> >>>>
> >>>> I've been testing with a similar change and couldn't find any
> >>>> regressions. I've also made the same modifications for Tegra114 and
> >>>> Tegra124.
> >>>>
> >>>> Are you saying that none of these patches are needed anymore? Or do we
> >>>> still need a patch to fix detaching? I'm thinking that maybe we can
> >>>> drastrically simplify the detachment now by dropping the shared
> >>>> parameter altogether.
> >>>>
> >>>> Let me draft a patch and send out the whole set for testing.
> >>>
> >>> Seems it's still not ideal because I noticed this in KMSG:
> >>>
> >>> [ 0.703185] Failed to attached device 54200000.dc to IOMMU_mapping
> >>> [ 0.710404] Failed to attached device 54240000.dc to IOMMU_mapping
> >>> [ 0.719347] Failed to attached device 54140000.gr2d to IOMMU_mapping
> >>> [ 0.719569] Failed to attached device 54180000.gr3d to IOMMU_mapping
> >>>
> >>> which comes from the implicit IOMMU backing.
> >>
> >> And the error comes from here:
> >>
> >> https://elixir.bootlin.com/linux/v5.4-rc2/source/drivers/iommu/iommu.c#L1655
> >
> > So the detaching still should be needed, but at the moment the ARM32
> > DMA-mapping code is simply not suitable for the case of having multiple
> > devices in the same group. I'm wondering whether there are any real
> > users for the implicit IOMMU backing on ARM32 at all :/
> >
>
> Apparently the "Failed to attached device 54200000.dc" was always in the
> log (I rarely testing the default multi-platform config), it's just the
> message is a pr_warn that I wasn't paying attention because it is
> colored like pr_info in dmesg :)

Yeah, so the above isn't a complete solution. In order to actually use
the DMA API backed by an IOMMU, some additional patches are needed. I
have all of those in a local tree and I've already sent out a couple of
them. It's taking a while because they all need to be applied in small
iterations to make sure things don't break midway.

Thierry

Attachment: signature.asc
Description: PGP signature