Re: [PATCH 1/1] drm/amdkfd: Protect the Client whilst it is being operated on

From: Felix Kuehling
Date: Wed Mar 23 2022 - 15:13:28 EST



Am 2022-03-23 um 08:46 schrieb Lee Jones:
On Thu, 17 Mar 2022, Lee Jones wrote:

On Thu, 17 Mar 2022, philip yang wrote:

On 2022-03-17 11:13 a.m., Lee Jones wrote:

On Thu, 17 Mar 2022, Felix Kuehling wrote:


Am 2022-03-17 um 11:00 schrieb Lee Jones:

Good afternoon Felix,

Thanks for your review.


Am 2022-03-17 um 09:16 schrieb Lee Jones:

Presently the Client can be freed whilst still in use.

Use the already provided lock to prevent this.

Cc: Felix Kuehling [1]<Felix.Kuehling@xxxxxxx>
Cc: Alex Deucher [2]<alexander.deucher@xxxxxxx>
Cc: "Christian König" [3]<christian.koenig@xxxxxxx>
Cc: "Pan, Xinhui" [4]<Xinhui.Pan@xxxxxxx>
Cc: David Airlie [5]<airlied@xxxxxxxx>
Cc: Daniel Vetter [6]<daniel@xxxxxxxx>
Cc: [7]amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: [8]dri-devel@xxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Lee Jones [9]<lee.jones@xxxxxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/a
mdkfd/kfd_smi_events.c
index e4beebb1c80a2..3b9ac1e87231f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -145,8 +145,11 @@ static int kfd_smi_ev_release(struct inode *inode, struct f
ile *filep)
spin_unlock(&dev->smi_lock);
synchronize_rcu();
+
+ spin_lock(&client->lock);
kfifo_free(&client->fifo);
kfree(client);
+ spin_unlock(&client->lock);

The spin_unlock is after the spinlock data structure has been freed.

Good point.

If we go forward with this approach the unlock should perhaps be moved
to just before the kfree().


There
should be no concurrent users here, since we are freeing the data structure.
If there still are concurrent users at this point, they will crash anyway.
So the locking is unnecessary.

The users may well crash, as does the kernel unfortunately.

We only get to kfd_smi_ev_release when the file descriptor is closed. User
mode has no way to use the client any more at this point. This function also
removes the client from the dev->smi_cllients list. So no more events will
be added to the client. Therefore it is safe to free the client.

If any of the above were not true, it would not be safe to kfree(client).

But if it is safe to kfree(client), then there is no need for the locking.

I'm not keen to go into too much detail until it's been patched.

However, there is a way to free the client while it is still in use.

Remember we are multi-threaded.

files_struct->count refcount is used to handle this race, as
vfs_read/vfs_write takes file refcount and fput calls release only if
refcount is 1, to guarantee that read/write from user space is finished
here.

Another race is driver add_event_to_kfifo while closing the handler. We
use rcu_read_lock in add_event_to_kfifo, and kfd_smi_ev_release calls
synchronize_rcu to wait for all rcu_read done. So it is safe to call
kfifo_free(&client->fifo) and kfree(client).
Philip, please reach out to Felix.
Philip, Felix, are you receiving my direct messages?

I have a feeling they're being filtered out by AMD's mail server.

I didn't get any direct messages. :/ I'll send you my private email address.

Regards,
  Felix