RE: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing payloads if last connected port isn't connected
From: Lin, Wayne
Date: Tue Aug 09 2022 - 23:29:34 EST
[Public]
> -----Original Message-----
> From: Lyude Paul <lyude@xxxxxxxxxx>
> Sent: Wednesday, August 3, 2022 6:12 AM
> To: Lin, Wayne <Wayne.Lin@xxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx;
> nouveau@xxxxxxxxxxxxxxxxxxxxx; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>; Zuo, Jerry
> <Jerry.Zuo@xxxxxxx>; Jani Nikula <jani.nikula@xxxxxxxxx>; Imre Deak
> <imre.deak@xxxxxxxxx>; Daniel Vetter <daniel.vetter@xxxxxxxx>; Sean Paul
> <sean@xxxxxxxxxx>; David Airlie <airlied@xxxxxxxx>; Daniel Vetter
> <daniel@xxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>; Lakha,
> Bhawanpreet <Bhawanpreet.Lakha@xxxxxxx>; open list <linux-
> kernel@xxxxxxxxxxxxxxx>
> Subject: Re: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing
> payloads if last connected port isn't connected
>
> On Tue, 2022-07-05 at 08:45 +0000, Lin, Wayne wrote:
> > [Public]
> >
> >
> >
> > > -----Original Message-----
> > > From: Lyude Paul <lyude@xxxxxxxxxx>
> > > Sent: Wednesday, June 8, 2022 3:30 AM
> > > To: dri-devel@xxxxxxxxxxxxxxxxxxxxx; nouveau@xxxxxxxxxxxxxxxxxxxxx;
> > > amd- gfx@xxxxxxxxxxxxxxxxxxxxx
> > > Cc: Lin, Wayne <Wayne.Lin@xxxxxxx>; Ville Syrjälä
> > > <ville.syrjala@xxxxxxxxxxxxxxx>; Zuo, Jerry <Jerry.Zuo@xxxxxxx>;
> > > Jani Nikula <jani.nikula@xxxxxxxxx>; Imre Deak
> > > <imre.deak@xxxxxxxxx>; Daniel Vetter <daniel.vetter@xxxxxxxx>; Sean
> > > Paul <sean@xxxxxxxxxx>; David Airlie <airlied@xxxxxxxx>; Daniel
> > > Vetter <daniel@xxxxxxxx>; Thomas Zimmermann
> <tzimmermann@xxxxxxx>;
> > > Lakha, Bhawanpreet <Bhawanpreet.Lakha@xxxxxxx>; open list
> > > <linux-kernel@xxxxxxxxxxxxxxx>
> > > Subject: [RESEND RFC 15/18] drm/display/dp_mst: Skip releasing
> > > payloads if last connected port isn't connected
> > >
> > > In the past, we've ran into strange issues regarding errors in
> > > response to trying to destroy payloads after a port has been
> > > unplugged. We fixed this back in:
> > >
> > > This is intended to replace the workaround that was added here:
> > >
> > > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > > ports in stale topology")
> > >
> > > which was intended fix to some of the payload leaks that were
> > > observed before, where we would attempt to determine if the port was
> > > still connected to the topology before updating payloads using
> > > drm_dp_mst_port_downstream_of_branch. This wasn't a particularly
> > > good solution, since one of the points of still having port and mstb
> > > validation is to avoid sending messages to newly disconnected
> > > branches wherever possible
> > > - thus the required use of drm_dp_mst_port_downstream_of_branch
> > > would indicate something may be wrong with said validation.
> > >
> > > It seems like it may have just been races and luck that made
> > > drm_dp_mst_port_downstream_of_branch work however, as while I
> was
> > > trying to figure out the true cause of this issue when removing the
> > > legacy MST code - I discovered an important excerpt in section
> > > 2.14.2.3.3.6 of the DP
> > > 2.0
> > > specs:
> > >
> > > "BAD_PARAM - This reply is transmitted when a Message Transaction
> > > parameter is in error; for example, the next port number is invalid
> > > or /no device is connected/ to the port associated with the port number."
> > >
> > > Sure enough - removing the calls to
> > > drm_dp_mst_port_downstream_of_branch()
> > > and instead checking the ->ddps field of the parent port to see
> > > whether we should release a given payload or not seems to totally
> > > fix the issue. This does actually make sense to me, as it seems the
> > > implication is that given a topology where an MSTB is removed, the
> > > payload for the MST parent's port will be released automatically if
> > > that port is also marked as disconnected.
> > > However, if there's another parent in the chain after that which is
> > > connected
> > > - payloads must be released there with an ALLOCATE_PAYLOAD message.
> > >
> > > So, let's do that!
> > >
> > > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx>
> > > Cc: Wayne Lin <Wayne.Lin@xxxxxxx>
> > > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > Cc: Fangzhi Zuo <Jerry.Zuo@xxxxxxx>
> > > Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
> > > Cc: Imre Deak <imre.deak@xxxxxxxxx>
> > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > > Cc: Sean Paul <sean@xxxxxxxxxx>
> > > ---
> > > drivers/gpu/drm/display/drm_dp_mst_topology.c | 51
> > > +++++++------------
> > > 1 file changed, 17 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > index dd314586bac3..70adb8db4335 100644
> > > --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
> > > @@ -3137,7 +3137,7 @@ static struct drm_dp_mst_port
> > > *drm_dp_get_last_connected_port_to_mstb(struct drm static struct
> > > drm_dp_mst_branch
> * drm_dp_get_last_connected_port_and_mstb(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > > struct drm_dp_mst_branch
> > > *mstb,
> > > - int *port_num)
> > > + struct drm_dp_mst_port
> > > **last_port)
> > > {
> > > struct drm_dp_mst_branch *rmstb = NULL;
> > > struct drm_dp_mst_port *found_port; @@ -3153,7 +3153,8 @@
> > > drm_dp_get_last_connected_port_and_mstb(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > >
> > > if (drm_dp_mst_topology_try_get_mstb(found_port-
> > > > parent)) {
> > > rmstb = found_port->parent;
> > > - *port_num = found_port->port_num;
> > > + *last_port = found_port;
> > > + drm_dp_mst_get_port_malloc(found_port);
> > > } else {
> > > /* Search again, starting from this parent
> > > */
> > > mstb = found_port->parent; @@ -3170,7
> > > +3171,7 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > > int pbn)
> > > {
> > > struct drm_dp_sideband_msg_tx *txmsg;
> > > - struct drm_dp_mst_branch *mstb;
> > > + struct drm_dp_mst_branch *mstb = NULL;
> > > int ret, port_num;
> > > u8 sinks[DRM_DP_MAX_SDP_STREAMS];
> > > int i;
> > > @@ -3178,12 +3179,22 @@ static int drm_dp_payload_send_msg(struct
> > > drm_dp_mst_topology_mgr *mgr,
> > > port_num = port->port_num;
> > > mstb = drm_dp_mst_topology_get_mstb_validated(mgr, port-
> > > > parent);
> > > if (!mstb) {
> > > - mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > - port-
> > > >parent,
> > > -
> > > &port_num);
> > > + struct drm_dp_mst_port *rport = NULL;
> > > + bool ddps;
> > >
> > > + mstb = drm_dp_get_last_connected_port_and_mstb(mgr,
> > > port->parent,
> > > +&rport);
> > > if (!mstb)
> > > return -EINVAL;
> > > +
> > > + ddps = rport->ddps;
> > > + port_num = rport->port_num;
> > > + drm_dp_mst_put_port_malloc(rport);
> > > +
> > > + /* If the port is currently marked as disconnected,
> > > +don't
> > > send
> > > a payload message */
> > > + if (!ddps) {
> > Hi Lyude,
> >
> > Thanks for driving this!
> > Shouldn't we still send ALLOCATE_PAYLOAD with PBN 0 to the last
> > connected Port even its peer device is disconnected? We rely on this
> > "path msg" to update all payload ID tables along the virtual payload
> > channel.
> >
>
> Do you know if there's any devices that break with this change, btw? Would
> be super useful to know imho, and if so I might be alright with dropping it
> depending on what the answer to the next paragraph is.
>
Hi Lyude,
Thanks for your time and sorry for late response!
It's described in 5.6.1.3 of DP spec 2.0:
"MST branch device, in addition to waiting for the ACK from its immediate
Upstream device, should either wait for the ALLOCATE_PAYLOAD message
transaction with a PBN value equal to 0 from the MST Source device for
de-allocating the time slot assigned to the VC Payload that is routed to the
unplugged DFP or for 2 seconds, whichever occurs first."
> > commit 3769e4c0af5b ("drm/dp_mst: Avoid to mess up payload table by
> > ports in stale topology") was trying to skip updating payload for a
> > target which is no longer existing in the current topology rooted at
> > mgr->mst_primary. I passed "mgr->mst_primary" to
> > drm_dp_mst_port_downstream_of_branch() previously.
> > Sorry, I might not fully understand the issue you've seen. Could you
> > elaborate on this more please?
> >
> > Thanks!
>
> I will have to double check this since it's been a month, but basically - the idea
> of having the topology references in the first place was to be the one check
> for figuring out whether something's in a topology or not. I've been thinking
> of maybe trying to replace it at some point, but I think we'd want to do it all
> over the helpers instead of just in certain spots.
>
> The other thing I noticed was that when I was rewriting this code, I noticed it
> seemed a lot like we had misunderstood the issue that was causing leaks in
> the first place. The BAD_PARAM we noticed indicates the payload we're
> trying to remove on the other end doesn't exist anymore, meaning the
> branch device in question got rid of any payloads it had active in response to
> the CSN. In testing though I found that payloads would be automatically
> released in situations where the last reachable port was marked as
> disconnected via a previous CSN, but was still reachable otherwise, and not in
> any other situation. This also seemed to match up with the excerpts in the DP
> spec that I found, so I assumed it was probably correct.
IMHO, the main root cause with the commit 3769e4c0af5b ("drm/dp_mst: Avoid
to mess up payload table by ports in stale topology") is like what described in the
commit message. The problem I encountered was when I unplugged the primary
mst branch device from the system, upper layer didn't try to release stale streams
immediately. Instead, it started to gradually release stale streams when I plugged the
mst hub back to the system. In that case, if we didn't do the check to see whether
the current request for deallocating payload is for this time topology instance,
i.e. might be for the stale topology before I unplug, this deallocation will mess up
payload allocation for new topology instance.
As for the CSN, it's a node broadcast request message and not a path message.
Referring to 2.14.6.1 of DP 2.0 spec:
"If the broadcast message is a node request, only the end devices, DP MST
Source or Sink devices (or DP MST Branch device if Source/Sink are not plugged),
process the request."
IMHO, payload should be controlled by source only, by ALLOCATE_PAYLOAD or
CLEAR_PAYLAOD_ID_TABLE message.
>
> Also, I think using the DDPS field instead of trying to traverse the topology
> state (which might not have been fully updated yet in response to CSNs)
> might be a slightly better idea since DDPS may end up being updated before
> the port has been removed from our in-memory topology, which is kind of
Thank you Lyude! Just want to confirm with you the below idea to see if I
understand it correctly.
The flow I thought would be (from Source perspective):
Receive CSN for notifying disconnection event => update physical topology
connection status (e.g. DDPS, put topology krefcount..) => send hotplug event to
userspace => userspace asks deallocating payloads for disconnected stream
sinks => put malloc krefcount of disconnected ports/mstbs => remove ports/mstb
from in-memory topology.
I suppose physical topology connection status is updated before sending hotplug
event to userspace and the in-memory topology still can be referred for stale
connection status before payload deallocation completes, i.e. which will put
malloc krefcount to eventually destroy disconnected devices in topology in-memory.
I mean, ideally, sounds like the topology in-memory should be reliable when
we send ALLOCATE_PAYLOAD as PBN=0. But I understand it definitely is not the
case if we have krefcount leak.
Appreciate for your time and help Lyude!
> one of the reasons I've been considering trying to come up with a better
> solution then topology references someday (unfortunately it works 'good
> enough' for the most part, so definitely not a priority). This is 100% a guess on
> my part though.
>
> > > + ret = -EINVAL;
> > > + goto fail_put;
> > > + }
> > > }
> > >
> > > txmsg = kzalloc(sizeof(*txmsg), GFP_KERNEL); @@ -3384,7
> > > +3395,6 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr, int start_s
> > > struct drm_dp_mst_port *port;
> > > int i, j;
> > > int cur_slots = start_slot;
> > > - bool skip;
> > >
> > > mutex_lock(&mgr->payload_lock);
> > > for (i = 0; i < mgr->max_payloads; i++) { @@ -3399,16
> > > +3409,6 @@ int drm_dp_update_payload_part1(struct
> > > drm_dp_mst_topology_mgr *mgr, int start_s
> > > port = container_of(vcpi, struct
> > > drm_dp_mst_port,
> > > vcpi);
> > >
> > > - mutex_lock(&mgr->lock);
> > > - skip
> > > = !drm_dp_mst_port_downstream_of_branch(port, mgr->mst_primary);
> > > - mutex_unlock(&mgr->lock);
> > > -
> > > - if (skip) {
> > > - drm_dbg_kms(mgr->dev,
> > > - "Virtual channel %d is
> > > not in current topology\n",
> > > - i);
> > > - continue;
> > > - }
> > > /* Validated ports don't matter if we're
> > > releasing
> > > * VCPI
> > > */
> > > @@ -3509,7 +3509,6 @@ int drm_dp_update_payload_part2(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > > struct drm_dp_mst_port *port;
> > > int i;
> > > int ret = 0;
> > > - bool skip;
> > >
> > > mutex_lock(&mgr->payload_lock);
> > > for (i = 0; i < mgr->max_payloads; i++) { @@ -3519,13
> > > +3518,6 @@ int drm_dp_update_payload_part2(struct
> > > drm_dp_mst_topology_mgr *mgr)
> > >
> > > port = container_of(mgr->proposed_vcpis[i], struct
> > > drm_dp_mst_port, vcpi);
> > >
> > > - mutex_lock(&mgr->lock);
> > > - skip = !drm_dp_mst_port_downstream_of_branch(port,
> > > mgr->mst_primary);
> > > - mutex_unlock(&mgr->lock);
> > > -
> > > - if (skip)
> > > - continue;
> > > -
> > > drm_dbg_kms(mgr->dev, "payload %d %d\n", i, mgr-
> > > > payloads[i].payload_state);
> > > if (mgr->payloads[i].payload_state ==
> > > DP_PAYLOAD_LOCAL) {
> > > ret = drm_dp_create_payload_step2(mgr, port,
> > > mgr-
> > > > proposed_vcpis[i]->vcpi, &mgr->payloads[i]); @@ -4780,18 +4772,9
> > > > @@
> > > EXPORT_SYMBOL(drm_dp_mst_reset_vcpi_slots);
> > > void drm_dp_mst_deallocate_vcpi(struct drm_dp_mst_topology_mgr
> > > *mgr,
> > > struct drm_dp_mst_port *port)
> > > {
> > > - bool skip;
> > > -
> > > if (!port->vcpi.vcpi)
> > > return;
> > >
> > > - mutex_lock(&mgr->lock);
> > > - skip = !drm_dp_mst_port_downstream_of_branch(port, mgr-
> > > > mst_primary);
> > > - mutex_unlock(&mgr->lock);
> > > -
> > > - if (skip)
> > > - return;
> > > -
> > > drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
> > > port->vcpi.num_slots = 0;
> > > port->vcpi.pbn = 0;
> > > --
> > > 2.35.3
> > --
> > Wayne Lin
>
> --
> Cheers,
> Lyude Paul (she/her)
> Software Engineer at Red Hat
--
Regards,
Wayne Lin