Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

From: Andrey Grodzovsky
Date: Tue Apr 24 2018 - 12:43:45 EST

On 04/24/2018 12:23 PM, Eric W. Biederman wrote:
Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> writes:

Avoid calling wait_event_killable when you are possibly being called
from get_signal routine since in that case you end up in a deadlock
where you are alreay blocked in singla processing any trying to wait
on a new signal.
I am curious what the call path that is problematic here.

Here is the problematic call stack

[<0>] drm_sched_entity_fini+0x10a/0x3a0 [gpu_sched]
[<0>] amdgpu_ctx_do_release+0x129/0x170 [amdgpu]
[<0>] amdgpu_ctx_mgr_fini+0xd5/0xe0 [amdgpu]
[<0>] amdgpu_driver_postclose_kms+0xcd/0x440 [amdgpu]
[<0>] drm_release+0x414/0x5b0 [drm]
[<0>] __fput+0x176/0x350
[<0>] task_work_run+0xa1/0xc0
[<0>] do_exit+0x48f/0x1280
[<0>] do_group_exit+0x89/0x140
[<0>] get_signal+0x375/0x8f0
[<0>] do_signal+0x79/0xaa0
[<0>] exit_to_usermode_loop+0x83/0xd0
[<0>] do_syscall_64+0x244/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

On exit from system call you process all the signals you received and encounter a fatal signal which triggers process termination.

In general waiting seems wrong when the process has already been
fatally killed as indicated by PF_SIGNALED.

So indeed this patch avoids wait in this case.

Returning -ERESTARTSYS seems wrong as nothing should make it back even
to the edge of userspace here.

Can you clarify please - what should be returned here instead ?


Given that this is the only use of PF_SIGNALED outside of bsd process
accounting I find this code very suspicious.

It looks the code path that gets called during exit is buggy and needs
to be sorted out.


Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..09fd258 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -227,9 +227,10 @@ void drm_sched_entity_do_release(struct drm_gpu_scheduler *sched,
* The client will not queue more IBs during this fini, consume existing
- * queued IBs or discard them on SIGKILL
+ * queued IBs or discard them when in death signal state since
+ * wait_event_killable can't receive signals in that state.
- if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
+ if (current->flags & PF_SIGNALED)
entity->fini_status = -ERESTARTSYS;
entity->fini_status = wait_event_killable(sched->job_scheduled,