Re: [PATCH v3 3/4] drm/bridge: Add devm_drm_bridge_add()
From: Doug Anderson
Date: Fri Jun 03 2022 - 10:56:43 EST
Hi,
On Fri, Jun 3, 2022 at 7:14 AM Maxime Ripard <maxime@xxxxxxxxxx> wrote:
>
> On Fri, Jun 03, 2022 at 01:19:16PM +0300, Dmitry Baryshkov wrote:
> > On Fri, 3 Jun 2022 at 11:21, Maxime Ripard <maxime@xxxxxxxxxx> wrote:
> > >
> > > On Tue, May 31, 2022 at 02:06:34PM -0700, Doug Anderson wrote:
> > > > On Mon, May 23, 2022 at 10:00 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote:
> > > > > On Sat, May 21, 2022 at 2:17 AM Maxime Ripard <maxime@xxxxxxxxxx> wrote:
> > > > > > On Tue, May 10, 2022 at 12:29:43PM -0700, Douglas Anderson wrote:
> > > > > > > This adds a devm managed version of drm_bridge_add(). Like other
> > > > > > > "devm" function listed in drm_bridge.h, this function takes an
> > > > > > > explicit "dev" to use for the lifetime management. A few notes:
> > > > > > > * In general we have a "struct device" for bridges that makes a good
> > > > > > > candidate for where the lifetime matches exactly what we want.
> > > > > > > * The "bridge->dev->dev" device appears to be the encoder
> > > > > > > device. That's not the right device to use for lifetime management.
> > > > > > >
> > > > > > > Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>
> > > > > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > > >
> > > > > > If we are to introduce more managed helpers, I think it'd be wiser to
> > > > > > introduce them as DRM-managed, and not device managed.
> > > > > >
> > > > > > Otherwise, you'll end up in a weird state when a device has been removed
> > > > > > but the DRM device is still around.
> > > > >
> > > > > I'm kinda confused. In this case there is no DRM device for the bridge
> > > > > and, as per my CL description, "bridge-dev->dev" appears to be the
> > > > > encoder device. I wasn't personally involved in discussions about it,
> > > > > but I was under the impression that this was expected / normal. Thus
> > > > > we can't make this DRM-managed.
> > > >
> > > > Since I didn't hear a reply,
> > >
> > > Gah, I replied but it looks like somehow it never reached the ML...
> > >
> > > Here was my original reply:
> > >
> > > > > > This adds a devm managed version of drm_bridge_add(). Like other
> > > > > > "devm" function listed in drm_bridge.h, this function takes an
> > > > > > explicit "dev" to use for the lifetime management. A few notes:
> > > > > > * In general we have a "struct device" for bridges that makes a good
> > > > > > candidate for where the lifetime matches exactly what we want.
> > > > > > * The "bridge->dev->dev" device appears to be the encoder
> > > > > > device. That's not the right device to use for lifetime management.
> > > > > >
> > > > > > Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>
> > > > > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > > > >
> > > > > If we are to introduce more managed helpers, I think it'd be wiser to
> > > > > introduce them as DRM-managed, and not device managed.
> > > > >
> > > > > Otherwise, you'll end up in a weird state when a device has been removed
> > > > > but the DRM device is still around.
> > > >=20
> > > > I'm kinda confused. In this case there is no DRM device for the bridge
> > > > and, as per my CL description, "bridge-dev->dev" appears to be the
> > > > encoder device.
> > >
> > > bridge->dev seems right though?
> > >
> > > > I wasn't personally involved in discussions about it, but I was under
> > > > the impression that this was expected / normal. Thus we can't make
> > > > this DRM-managed.
> > >
> > > Still, I don't think devm is the right solution to this either.
> > >
> > > The underlying issue is two-fold:
> > >
> > > - Encoders can have a pointer to a bridge through of_drm_find_bridge
> > > or similar. However, bridges are traditionally tied to their device
> > > lifetime (by calling drm_bridge_add in probe, and drm_bridge_remove
> > > in remove). Encoders will typically be tied to the DRM device
> > > however, and that one sticks around until the last application
> > > closes it. We can thus very easily end up with a dangling pointer,
> > > and a use-after-free.
> > >
> > > - It's not the case yet, but it doesn't seem far fetch to expose
> > > properties of bridges to the userspace. In that case, the userspace
> > > would be likely to still hold references to objects that aren't
> > > there anymore when the bridge is gone.
> > >
> > > The first is obviously a larger concern, but if we can find a solution
> > > that would accomodate the second it would be great.
> > >
> > > As far as I can see, we should fix in two steps:
> > >
> > > - in drm_bridge_attach, we should add a device-managed call that will
> > > unregister the main DRM device. We don't allow to probe the main DRM
> > > device when the bridge isn't there yet in most case, so it makes
> > > sense to remove it once the bridge is no longer there as well.
> >
> > The problem is that I do not see a good way to unregister the main DRM
> > device outside of it's driver code.
>
> That's what drmm helpers are doing though: they'll defer the cleanup
> until the last user has closed its fd.
I'm a bit confused here. I'll take the concrete example of ps8640
since that's what I was working on here.
...right now the fact that we're using devm means that
drm_bridge_remove() will get called when a ps8640 device is unbound,
right? I guess you're saying that the "drm_bridge" memory needs to
outlast this, right? That being said, even if the actual memory for
drm_bridge outlasts the ps8640 driver lifetime, much of the data would
need to be marked invalid I think. If nothing else all function
pointers that point into the driver would have to be made NULL, right?
Once the device has been unbound it's possible that the underlying
module might be removed. I suspect that we'd need to do more than just
bogus-up the function pointers, though.
...so it feels like any solution here needs to take into account
_both_ the lifetime of the "struct device" and the "struct
drm_device". If the "struct device" goes away but the "struct
drm_device" is still around then we need to essentially transition the
"struct drm_device" over to a dummy, right? In my perhaps naive view
that means that a dmm_bridge_add() wouldn't be enough because it
wouldn't know when the "struct device" went away.
> > > - When the DRM device is removed, have the core cleanup any bridge
> > > registered. That will remove the need to have drm_bridge_remove in
> > > the first place.
> > >
> > > > I'll assume that my response addressed your concerns. Assuming I get
> > > > reviews for the other two patches in this series I'll plan to land
> > > > this with Dmitry's review.
> > >
> > > I still don't think it's a good idea to merge it. It gives an illusion
> > > of being safe, but it's really far from it.
> >
> > It is more of removing the boilerplate code spread over all the
> > drivers rather than about particular safety.
> >
> > I'd propose to land devm_drm_bridge_add (and deprecate calling
> > drm_bridge_remove from the bridge driver at some point) and work on
> > the whole drm_device <-> drm_bridge problem in the meantime.
>
> Do you really expect that to happen? :)
>
> Anyway, it's been merged, it's too late now anyway. I don't really feel
> like it's a good thing, but it doesn't really make the situation worse
> either.
A revert is really not that hard to do if the consensus is that we
really don't want this.
-Doug