Re: [PATCH 2/7] drm/encoder: drm_encoder_cleanup: take chain mutex while tearing down

From: Luca Ceresoli

Date: Mon Sep 29 2025 - 10:31:43 EST

Hi Maxime,

On Mon, 29 Sep 2025 14:45:10 +0200
Maxime Ripard <mripard@xxxxxxxxxx> wrote:

> On Fri, Sep 26, 2025 at 05:59:43PM +0200, Luca Ceresoli wrote:
> > drm_encoder_cleanup() modifies the encoder chain by removing bridges via
> > drm_bridge_detach(). Protect this whole operation by taking the mutex, so
> > any users iterating over the chain will not access it during the change.
> >
> > Signed-off-by: Luca Ceresoli <luca.ceresoli@xxxxxxxxxxx>
> > ---
> > drivers/gpu/drm/drm_encoder.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_encoder.c b/drivers/gpu/drm/drm_encoder.c
> > index 3261f142baea30c516499d23dbf8d0acf5952cd6..3a04bedf9b759acd6826864b7f2cc9b861a8f170 100644
> > --- a/drivers/gpu/drm/drm_encoder.c
> > +++ b/drivers/gpu/drm/drm_encoder.c
> > @@ -195,9 +195,11 @@ void drm_encoder_cleanup(struct drm_encoder *encoder)
> > * the indices on the drm_encoder after us in the encoder_list.
> > */
> >
> > + mutex_lock(&encoder->bridge_chain_mutex);
> > list_for_each_entry_safe(bridge, next, &encoder->bridge_chain,
> > chain_node)
> > drm_bridge_detach(bridge);
> > + mutex_unlock(&encoder->bridge_chain_mutex);
>
> You were claiming that the mutex was to prevent issues with concurrent
> iteration and removal of the list members. list_for_each_entry_safe() is
> explicitly made to protect against that. Why do we need both?

You're right saying we don't need both. With a mutex preventing the list
from any change, we can actually simpify code a bit to use the non-safe
list macro:

- struct drm_bridge *bridge, *next;
+ struct drm_bridge *bridge;
...
+ mutex_lock(&encoder->bridge_chain_mutex);
- list_for_each_entry_safe(bridge, next, &encoder->bridge_chain,
+ list_for_each_entry(bridge, &encoder->bridge_chain,
chain_node)
drm_bridge_detach(bridge);
+ mutex_unlock(&encoder->bridge_chain_mutex);

But it's not fully correct that list_for_each_entry_safe() protects
against concurrent removal. As I see it, all the _safe variants of
list_for_each*() macros protect against one specific and frequent use
case:

list_for_each_entry_safe(curs, n, some_list, membername) {
...
list_del(&curs->membername);
...
}

So the _safe variant protect from removing the element at the current
iteration (*curs). It does so by taking the following element pointer in
advance, in the auxiliary variable @n. But _concurrent_ removal (the
topic of this series) means the element being removed could just be the
following one.

Consider this sequence:

1. start loop: list_for_each_entry_safe() sets;
pos = list_first_entry() = <bridge 1>
n = list_next_entry(pos) = <bridge 2>
2. enter the loop 1st time, do something with *pos (bridge 1)
3. in the meanwhile bridge 2 is hot-unplugged
-> another thread removes item 2 -> drm_bridge_detach()
-> list_del() sets bridge->next = LIST_POISON1
4. loop iteration 1 finishes, list_for_each_entry_safe() sets:
pos = n (previously set to bridge 2)
n = (bridge 2)->next = LIST_POISON1
5. enter the loop 2nd time, do something with *pos (bridge 2)
6. loop iteration 2 finishes, list_for_each_entry_safe() sets:
pos = n (previously set to LIST_POISON1) -> bug

Do you think this explains the need for this series?

If it does, I should probably add this to the cover letter and maybe
patch 1.

Luca

--
Luca Ceresoli, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com