Re: [PATCH] drm/msm/dp: move link_ready out of HPD event thread

From: Abhinav Kumar
Date: Mon Mar 18 2024 - 14:02:20 EST




On 3/15/2024 8:57 AM, Johan Hovold wrote:
On Thu, Mar 14, 2024 at 09:30:57AM -0700, Abhinav Kumar wrote:
On 3/14/2024 8:38 AM, Johan Hovold wrote:
On Wed, Mar 13, 2024 at 10:24:08AM -0700, Abhinav Kumar wrote:

Perhaps I'm missing something in the race that you are trying to
describe (and which I've asked you to describe in more detail so that I
don't have to spend more time trying to come up with a reproducer
myself).

The race condition is between the time we get disconnect event and set
link_ready to false, a commit can come in. Because setting link_ready to
false happens in the event thread so it could be slightly delayed.

I get this part, just not why, or rather when, that becomes a problem.

Once the disconnect event is processed, dp_hpd_unplug_handle() will
update the state to ST_DISCONNECT_PENDING, and queue a notification
event. link_ready is (before this patch) still set to 1.


This is the case I am thinking of:

1) Disconnect event happens which will call dp_hpd_unplug_handle() but link_ready is not false yet.

2) There is a commit with a modeset, which shall trigger atomic_disable() followed by an atomic_enable()

atomic_disable() will go through disable clocks and set hpd_state to ST_DISCONNECTED.

3) atomic_enable() will not go through because we will bail out because state was ST_DISCONNECTED.

if (state != ST_DISPLAY_OFF && state != ST_MAINLINK_READY) {
mutex_unlock(&dp_display->event_mutex);
return;
}

4) Now, if there is another commit with a modeset, it will go and crash at atomic_disable()

Here a commit comes in; what exactly are you suggesting would trigger
that? And in such a way that it breaks the state machine?


Like we have seen, the commit can either come directly from userspace as one last frame (the original bug I had given the link to) or from the __drm_fb_helper_restore_fbdev_mode_unlocked() which happened in sc8280xp's case. This is totally independent of the hpd_thread() with no mutual exclusion.

This commit() can come before the link_ready was set to false. If it had come after link_ready was set to false, atomic_check() would have failed and no issue would have been seen.

My change is making the link_ready false sooner in the disconnect case.

One way this could cause trouble is if you end up with a call to
dp_bridge_atomic_post_disable() which updates the state to
ST_DISCONNECTED. (1)

This would then need to be followed by another call to
dp_bridge_atomic_enable() which bails out early with the link clock
disabled. (2) (And if link_ready were to be set to 0 sooner, the
likelihood of this is reduced.)

This in turn, would trigger a reset when dp_bridge_atomic_disable() is
later called.


Yes, this is exactly what I have written above.

This is the kind of description of the race I expect to see in the
commit message, and I'm still not sure what would trigger the call to
dp_bridge_atomic_post_disable() and dp_bridge_atomic_enable() (i.e. (1)
and (2) above) and whether this is a real issue or not.


I have explained what triggers the disable/enable call below.

Also note that the above scenario is quite different from the one I've
hit and described earlier.


Why is that so? Eventually it will also translate to the same scenario. I would like to understand why this is different. I think in your case, probably we do not know what triggers the modeset, but its a minor detail like I have written before.

It will be hard to reproduce this. Only way I can think of is to delay
the EV_NOTIFICATION for sometime and see in dp_bridge_hpd_notify()

else if (dp_display->link_ready && status ==
connector_status_disconnected)
dp_add_event(dp, EV_HPD_UNPLUG_INT, 0, 0);

as dp_add_event() will add the event, then wakeup the event_q.

Sure that would increase the race window with the current code, but that
alone isn't enough to trigger the bug AFAICT.

Before the event thread wakes up and processes this unplug event, the
commit can come in. This is the race condition i was thinking of.

Yes, but what triggers the commit? And why would it lead to a mode set
that disables the bridge?


Commit was triggered from the userspace as it did not process the disconnect event on time and the userspace was triggering a couple of modesets by by changing the mode on the CRTC from 1080P to NONE to 1080P.

[drm:drm_atomic_helper_check_modeset] [CRTC:60:crtc-1] mode changed

Johan