Re: [PATCH 1/1] drm/amdkfd: Protect the Client whilst it is being operated on

From: Lee Jones
Date: Thu Mar 17 2022 - 11:00:33 EST


Good afternoon Felix,

Thanks for your review.

> Am 2022-03-17 um 09:16 schrieb Lee Jones:
> > Presently the Client can be freed whilst still in use.
> >
> > Use the already provided lock to prevent this.
> >
> > Cc: Felix Kuehling <Felix.Kuehling@xxxxxxx>
> > Cc: Alex Deucher <alexander.deucher@xxxxxxx>
> > Cc: "Christian König" <christian.koenig@xxxxxxx>
> > Cc: "Pan, Xinhui" <Xinhui.Pan@xxxxxxx>
> > Cc: David Airlie <airlied@xxxxxxxx>
> > Cc: Daniel Vetter <daniel@xxxxxxxx>
> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Lee Jones <lee.jones@xxxxxxxxxx>
> > ---
> > drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> > index e4beebb1c80a2..3b9ac1e87231f 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> > @@ -145,8 +145,11 @@ static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
> > spin_unlock(&dev->smi_lock);
> > synchronize_rcu();
> > +
> > + spin_lock(&client->lock);
> > kfifo_free(&client->fifo);
> > kfree(client);
> > + spin_unlock(&client->lock);
>
> The spin_unlock is after the spinlock data structure has been freed.

Good point.

If we go forward with this approach the unlock should perhaps be moved
to just before the kfree().

> There
> should be no concurrent users here, since we are freeing the data structure.
> If there still are concurrent users at this point, they will crash anyway.
> So the locking is unnecessary.

The users may well crash, as does the kernel unfortunately.

> > return 0;
> > }
> > @@ -247,11 +250,13 @@ int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
> > return ret;
> > }
> > + spin_lock(&client->lock);
>
> The client was just allocated, and it wasn't added to the client list or
> given to user mode yet. So there can be no concurrent users at this point.
> The locking is unnecessary.
>
> There could be potential issues if someone uses the file descriptor by dumb
> luck before this function returns. So maybe we need to move the
> anon_inode_getfd to the end of the function (just before list_add_rcu) so
> that we only create the file descriptor after the client structure is fully
> initialized.

Bingo. Well done. :)

I can move the function as suggested if that is the best route forward?

--
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog