Re: [PATCH] drm/msm: Initialize MDSS irq domain at probe time

From: Dmitry Baryshkov
Date: Mon Nov 29 2021 - 13:32:42 EST


Hi,

On Mon, 29 Nov 2021 at 17:15, AngeloGioacchino Del Regno
<angelogioacchino.delregno@xxxxxxxxxxxxx> wrote:
>
> Il 29/11/21 03:20, Dmitry Baryshkov ha scritto:
> > Hi,
> >
> > On 25/11/2021 18:09, AngeloGioacchino Del Regno wrote:
> >> Since commit 8f59ee9a570c ("drm/msm/dsi: Adjust probe order"), the
> >> DSI host gets initialized earlier, but this caused unability to probe
> >> the entire stack of components because they all depend on interrupts
> >> coming from the main `mdss` node (mdp5, or dpu1).
> >>
> >> To fix this issue, also anticipate probing mdp5 or dpu1 by initializing
> >> them at msm_pdev_probe() time: this will make sure that we add the
> >> required interrupt controller mapping before dsi and/or other components
> >> try to initialize, finally satisfying the dependency.
> >>
> >> While at it, also change the allocation of msm_drm_private to use the
> >> devm variant of kzalloc().
> >>
> >> Fixes: 8f59ee9a570c ("drm/msm/dsi: Adjust probe order")
> >> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
> >
> > I have been thinking about this. I do not feel that this is the correct approach.
> > Currently DRM device exists only when all components are bound. If any of the
> > subdevices is removed, corresponding component is delteted (and thus all components
> > are unbound), the DRM device is taken down. This results in the state cleanup,
> > userspace notifications, etc.
> >
> > With your changes, DRM device will continue to exist even after one of subdevices
> > is removed. This is not an expected behaviour, since subdrivers do not perform full
> > cleanup, delegating that to DRM device takedown.
> >
> > I suppose that proper solution would be to split msm_drv.c into into:
> > - generic components & drm code to be called from mdp4/mdp5/dpu driver (making
> > mdp4, mdp5 or dpu1 the components master)
> >
> > - bare mdss driver, taking care only about IRQs, OF devices population - calling
> > proper mdss_init/mdss_destroy functions. Most probably we can drop this part
> > altogether and just make md5_mdss.c/dpu_mdss.c proper platform drivers.
> >
>
>
> Hmm... getting a better look on how things are structured... yes, I mostly agree
> with you, though I'm not sure about making MDP{4,5}/DPU1 the component master; that
> would result in a major change in drm-msm, which may be "a bit too much".
>
> Don't misunderstand me here, please, major changes are fine - but I feel urgency
> to get this bug solved ASAP (since drm-msm is currently broken at least for drm
> bridges) and, if we do anything major, that would require a very careful slow
> review process that will leave this driver broken for a lot of time.

Yep. I wanted to hear your and Rob's opinion, that's why I did not
rush into implementing it.
I still think this is the way to go in the long term. What I really
liked in your patchset was untying the knot between component binds
returning EPROBE_DEFER and mdss subdevices being in place. This solved
the "dsi host registration" infinite loop for me.

>
> I actually tried something else that "simplifies" the former approach, so here's
> my proposal:
> * we introduce {mdp5,dpu}_mdss_early_init(struct device, struct msm_drm_private)
> * allocate only msm_drm_private in msm_pdev_probe, leaving the drm_dev_alloc call
> into msm_drm_init(), so that the drm_dev_put() stays in msm_drm_uninit()
> * pass msm_drm_private as drvdata instead of drm_device
> * change all the drvdata users to get drm_device from priv->dev, instead of getting
> msm_drm_private from drm_device->dev_private (like many other drm drivers are
> currently doing)

This sounds in a way that it should work. I'm looking forward to
seeing (and testing) your patches.

>
> This way, we keep the current flow of creating the DRM device at msm_drm_init time
> and tearing it down at msm_drm_unbind time, solving the issue that you are
> describing.
>
> If you're okay with this kind of approach, I have two patches here that are 95%
> ready, can finish them off and send briefly.
>
> Though, something else must be noted here... in the last mail where I'm pasting
> a crash that happens when running 'rmmod panel_edp ti_sn65dsi86', I have implied
> that this is happening due to the patch that I've sent: after some more research,
> I'm not convinced anymore that this is a consequence of that. That crash may not
> be related to my fix at all, but to something else (perhaps also related to commit
> 8f59ee9a570c, the one that we're fixing here).
>
> Of course, that crash still happens even with the approach that I've just proposed.
>
>
> Looking forward for your opinion!
>
> Cheers,
> - Angelo



--
With best wishes
Dmitry