Re: [PATCH] drm/msm: Initialize mode_config earlier

From: Bjorn Andersson
Date: Thu Mar 02 2023 - 18:18:02 EST


On Wed, Mar 01, 2023 at 02:58:50PM +0100, Johan Hovold wrote:
> On Tue, Jan 24, 2023 at 09:09:02AM +0100, Johan Hovold wrote:
> > On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote:
> > > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote:
> > > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote:
> > > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote:
> >
> > > > > > Perhaps we have shuffled other things around to avoid this bug? Either
> > > > > > way, let's this on hold until further proof that it's still
> > > > > > reproducible.
> > > > >
> > > > > As I've mentioned off list, I haven't hit the apparent race I reported
> > > > > here:
> > > > >
> > > > > https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/
> > > > >
> > > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it
> > > > > could very well be that something has changes that fixes (or hides) the
> > > > > issue since.
> > > >
> > > > For unrelated reasons, I tried enabling async probing, and apart from
> > > > apparently causing the panel driver to probe defer indefinitely, I also
> > > > again hit the WARN_ON() I had added to catch this:
> > > >
> > > > [ 13.593235] WARNING: CPU: 0 PID: 125 at drivers/gpu/drm/drm_probe_helper.c:664 drm_kms_helper_hotplug_event+0x48/0x7
> > > > 0 [drm_kms_helper]
> >
> > > > So the bug still appears to be there (and the MSM DRM driver is fragile
> > > > and broken, but we knew that).
> > > >
> > >
> > > But the ordering between mode_config.funcs = !NULL and
> > > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear.
> > >
> > > And my testing shows that drm_kms_helper_poll_init() is the cause for
> > > getting bridge->hpd_cb != NULL.
> > >
> > > So the ordering seems legit, unless there's something else causing the
> > > assignment of bridge->hpd_cb to happen earlier in this scenario.
> >
> > I'm not saying that this patch is correct (indeed it doesn't seem to
> > be), but only that the bug I reported still appears to be present in
> > 6.2.
>
> So after debugging this issue a third time, I can conclude that it is
> still very much present in 6.2.
>
> It appears you looked at the linux-next tree when you concluded that
> this patch was not needed. In 6.2 the bridge->hpd_cb callback is set
> before mode_config.funcs is initialised as part of
> kms->funcs->hw_init(kms).
>
> The hpd DRM changes heading into 6.3 do appear to avoid the NULL-pointer
> dereference by moving the bridge->hpd_cb initialisation to
> drm_kms_helper_poll_init() as you mention above.
>
> The PMIC GLINK altmode driver still happily forwards notifications
> regardless of the DRM driver state though, which can lead to missed
> hotplug events. It seems you need to implement the
> hpd_enable()/disable() callbacks and either cache or not enable events
> in fw until the DRM driver is ready.
>

It's not clear to me what the expectation from the DRM framework is on
this point. We register a drm_bridge which is only capable of signaling
HPD events (DRM_BRIDGE_OP_HPD), not querying HPD state (DRM_BRIDGE_OP_DETECT).

Does this imply that any such bridge must ensure that hpd events are
re-delivered once hpd_enable() has been invoked (we can't invoke it from
hpd_enable...)?

Is it reasonable to do this retriggering in the altmode driver? Or is it
the job of the TCPM (it seems reasonable to not send the PAN_EN message
until we get hpd_enable()...)?

Regards,
Bjorn