Re: [PATCH] drm/msm: Initialize mode_config earlier

From: Johan Hovold
Date: Wed Mar 01 2023 - 08:58:32 EST


On Tue, Jan 24, 2023 at 09:09:02AM +0100, Johan Hovold wrote:
> On Mon, Jan 23, 2023 at 09:17:49AM -0800, Bjorn Andersson wrote:
> > On Mon, Jan 23, 2023 at 05:01:45PM +0100, Johan Hovold wrote:
> > > On Tue, Jan 17, 2023 at 09:04:39AM +0100, Johan Hovold wrote:
> > > > On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote:
>
> > > > > Perhaps we have shuffled other things around to avoid this bug? Either
> > > > > way, let's this on hold until further proof that it's still
> > > > > reproducible.
> > > >
> > > > As I've mentioned off list, I haven't hit the apparent race I reported
> > > > here:
> > > >
> > > > https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/
> > > >
> > > > since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it
> > > > could very well be that something has changes that fixes (or hides) the
> > > > issue since.
> > >
> > > For unrelated reasons, I tried enabling async probing, and apart from
> > > apparently causing the panel driver to probe defer indefinitely, I also
> > > again hit the WARN_ON() I had added to catch this:
> > >
> > > [ 13.593235] WARNING: CPU: 0 PID: 125 at drivers/gpu/drm/drm_probe_helper.c:664 drm_kms_helper_hotplug_event+0x48/0x7
> > > 0 [drm_kms_helper]
>
> > > So the bug still appears to be there (and the MSM DRM driver is fragile
> > > and broken, but we knew that).
> > >
> >
> > But the ordering between mode_config.funcs = !NULL and
> > drm_kms_helper_poll_init() in msm_drm_init() seems pretty clear.
> >
> > And my testing shows that drm_kms_helper_poll_init() is the cause for
> > getting bridge->hpd_cb != NULL.
> >
> > So the ordering seems legit, unless there's something else causing the
> > assignment of bridge->hpd_cb to happen earlier in this scenario.
>
> I'm not saying that this patch is correct (indeed it doesn't seem to
> be), but only that the bug I reported still appears to be present in
> 6.2.

So after debugging this issue a third time, I can conclude that it is
still very much present in 6.2.

It appears you looked at the linux-next tree when you concluded that
this patch was not needed. In 6.2 the bridge->hpd_cb callback is set
before mode_config.funcs is initialised as part of
kms->funcs->hw_init(kms).

The hpd DRM changes heading into 6.3 do appear to avoid the NULL-pointer
dereference by moving the bridge->hpd_cb initialisation to
drm_kms_helper_poll_init() as you mention above.

The PMIC GLINK altmode driver still happily forwards notifications
regardless of the DRM driver state though, which can lead to missed
hotplug events. It seems you need to implement the
hpd_enable()/disable() callbacks and either cache or not enable events
in fw until the DRM driver is ready.

Johan