Re: [RFC PATCH 1/6] drm/dp: Helpers to make it easier for drivers to use DP AUX bus properly

From: Doug Anderson
Date: Fri Apr 15 2022 - 17:13:28 EST


Hi,

On Thu, Apr 14, 2022 at 4:52 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Douglas Anderson (2022-04-08 19:36:23)
> > As talked about in the kerneldoc for "struct dp_aux_ep_client" in this
> > patch and also in the past in commit a1e3667a9835 ("drm/bridge:
> > ti-sn65dsi86: Promote the AUX channel to its own sub-dev"), to use the
> > DP AUX bus properly we really need two "struct device"s. One "struct
> > device" is in charge of providing the DP AUX bus and the other is
> > where we'll try to get a reference to the newly probed endpoint
> > devices.
> >
> > In ti-sn65dsi86 this wasn't too difficult to accomplish. That driver
> > is already broken up into several "struct devices" anyway because it
> > also provides a PWM and some GPIOs. Adding one more wasn't that
> > difficult / ugly.
> >
> > When I tried to do the same solution in parade-ps8640, it felt like I
> > was copying too much boilerplate code. I made the realization that I
> > didn't _really_ need a separate "driver" for each person that wanted
> > to do the same thing. By putting all the "driver" related code in a
> > common place then we could save a bit of hassle. This change
> > effectively adds a new "ep_client" driver that can be used by
> > anyone. The devices instantiated by this driver will just call through
> > to the probe/remove/shutdown calls provided.
> >
> > At the moment, the "ep_client" driver is backed by the Linux auxiliary
> > bus (unfortunate naming--this has nothing to do with DP AUX). I didn't
> > want to expose this to clients, though, so as far as clients are
> > concerned they get a vanilla "struct device".
> >
> > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > ---
>
> Is it correct that the struct dp_aux_ep_client is largely equivalent to
> a struct dp_aux_ep_device? What's the difference? I think it is a fully
> probed dp_aux_ep_device? Or a way to tell the driver that calls
> of_dp_aux_populate_ep_devices() that the endpoint has probed now?

They're not the same. The "DP AUX Endpoint Device" is essentially the
panel, though at one point in time someone argued that conceivably one
could put other devices on it even though this might be a really bad
idea. At some point in the discussion someone mentioned the concept of
a touchscreen running over DP Aux had been discussed and, indeed, "dp
aux touchscreen" gets hits if you search for it. The idea that it
could be something different is one reason why I refrained from
calling it a panel in all the function names and always tried to call
it a "DP AUX Endpoint".

The "DP AUX Endpoint Device Client" is the client of the panel, or the
code that needs to get a reference to the panel. Logically, I guess
this is the part of the eDP controller that's responsible for shoving
bits to the panel. Essentially the drm_bridge. Most importantly, it's
_not_ the part of the eDP controller providing the AUX bus.


> I read the commit text but it didn't click until I read the next patch
> that this is solving a probe ordering/loop problem. The driver that
> creates the 'struct drm_dp_aux' and populates devices on the DP aux bus
> with of_dp_aux_populate_ep_devices() wants to be sure that the
> drm_bridge made by the 'struct dp_aux_ep_driver' probe routine in
> edp-panel is going to be there before calling drm_of_get_bridge().
>
> of_dp_aux_populate_ep_devices();
> [No idea if the bridge is registered yet]
> drm_of_get_bridge();
>
> The solution is to retry the drm_of_get_bridge() until 'struct
> dp_aux_ep_driver' probes and registers the next bridge. It looks like a
> wait_for_completion() on top of drm_of_get_bridge() implemented through
> driver probe and -EPROBE_DEFER; no disrespect!

Yes, that's exactly what it is.


> Is there another problem here that the driver that creates the 'struct
> drm_dp_aux' and populates devices on the DP aux bus with
> of_dp_aux_populate_ep_devices() wants to use that same 'struct
> drm_dp_aux' directly to poke at some 'struct dp_aux_ep_device' that was
> populated? And the 'struct dp_aux_ep_device' may either not be probed
> and bound to a 'struct dp_aux_ep_driver' or it may not be powered on
> because it went to runtime PM suspend?
>
> Put another way, I worry that the owner of 'struct drm_dp_aux' wants to
> use some function in include/drm/dp/drm_dp_helper.h that takes the
> 'struct drm_dp_aux' directly without considering the device model state
> of the endpoint device (the 'struct dp_aux_ep_device'). That would be a
> similar problem as waiting for the endpoint to be powered on and probed.
> Solving that through a sub-driver feels awkward.
>
> What if we had some function like drm_dp_get_aux_ep() that took a
> 'struct drm_dp_aux' and looked for the endpoint device (there can only
> be one?),

The code is designed to handle the fact that there could be more than
one AUX endpoint device. I don't know if this will ever happen but it
is plausible. The "touchscreen over DP AUX", if that ever became a
thing supported in Linux, could be an example. In some sense, I guess
we could have modeled the AUX backlight for homestar this way as a
second "DP AUX Endpoint", though we didn't...


> waited for it to be probed,

The $1M question: where should it be doing the waiting? Are you
imagining this straight in my probe? AKA:

def ps8640_probe(...):
...
devm_of_dp_aux_populate_ep_devices(...);
do_the_wait(...);
bridge = devm_drm_of_get_bridge(...);
if (bridge == -EPROBE_DEFER)
return -EPROBE_DEFER;
...

Essentially, if the panel is probing asynchronously then this would
wait for it. If the panel instead needs some resources then it should
be fine for us to just pass the -EPROBE_DEFER up and we'll both try
probing again later. That definitely feels simpler to me and matches
how I would wish things to work.

There are two problems:

1. I'm not sure how to wait. Early in ps8640 support I had Philip try
wait_for_device_probe(). That wasn't so happy. I suppose I could try
to come up with some solution if this is indeed the way we want to go.

2. The way probing currently works, if we end up in the -EPROBE_DEFER
case then we end up with an infinite loop. :( As I understand it, the
basic rule is that if your probe causes any additional devices to be
created (like by calling devm_of_dp_aux_populate_ep_devices()) then
your probe isn't allowed to return -EPROBE_DEFER after the call. This
is the same problem that the main msm used to have (Dmitry says it's
fixed now). I think this is just a fundamental design issue with
deferred probing. Anytime a device probes then it causes
driver_deferred_probe_trigger() to run again. So we basically have:

a) ps8640 probe returns -EPROBE_DEFER for whatever reason.
b) some device in the system probes and all deferred probes are retriggered.
c) ps8640 gets pulled off the list by the worker
d) ps8640 probe gets called
e) ps8640 creates a sub device for the panel, which triggers deferred probing
f) ps8640 returns and gets put on the deferred list
g) goto step c)

On first instinct, you might think this is easy to solve. Maybe
somehow you could make it so the "trigger" doesn't re-trigger the
ps8640 since it hasn't actually _returned_ from probing yet. The
problem is that I believe there'd be a race in the async probe case.
See the description for driver_deferred_probe_trigger() for some
details.


What we _could_ do is that we could re-create deferred probing
ourselves. :-P So instead of creating a sub-device, I could kick off
some type of asynchronous task that would periodically run and look to
see if the panel has shown up. Then, once the panel shows up then I
could create the bridge. I'm not really convinced that this is better,
though.


> and then powered it up via some
> pm_runtime_get_sync()? My understanding is that with edp-panel we have
> two power "domains" of importance, the panel power domain and the DP/eDP
> power domain. Access to the AUX bus via 'struct drm_dp_aux' needs both
> power domains to be powered on. If the 'struct dp_aux_ep_device' is in
> hand, then both power domains can be powered on by making sure the power
> state of the 'struct dp_aux_ep_device::dev' is enabled. If only the
> 'struct drm_dp_aux' is in hand then we'll need to do something more like
> find the child device and power it on.

So right now it doesn't work that way. The whole thing is structured
more like an i2c bus. The i2c bus doesn't power its devices on. The
devices are in charge of powering themselves on. If the i2c bus itself
has a low power state that it can be in when the devices don't need to
communicate then that's fine. It can power itself on whenever the
devices need to communicate. If this is a high-cost thing then the bus
can use pm_runtime. Following this model is what leads us to the panel
being in charge of reading the EDID.


> Could the 'struct drm_dp_aux' functions in drm_dp_helper.h do this
> automatically somehow? Maybe we can drop a variable in 'struct
> drm_dp_aux' when of_dp_aux_populate_ep_devices() is called that the
> other side may not be powered up. Then if something tries to transfer on
> that aux channel it powers on all children of 'struct drm_dp_aux::dev'
> that are on the 'dp_aux_bus_type' or bails out if those devices haven't
> probed yet or can't be powered on.

We can have more discussions about powering and who's in charge of
powering who if we need to, but it's not really what this series is
about. Here we're worried about making sure that we acquire all of our
resources before we create the drm_bridge. Said another way: we're
trying to acquire a handle to the panel before we add the bridge
because once we add the bridge we're asserting that we're all ready to
go and start using things, right? So we basically just want to make
sure that the panel is present and able to acquire all of _its_
resources.



-Doug