Re: [PATCH 2/3] drm/scheduler: Don't call wait_event_killable for signaled process.

From: Andrey Grodzovsky
Date: Mon Apr 30 2018 - 12:10:55 EST




On 04/30/2018 12:00 PM, Oleg Nesterov wrote:
On 04/30, Andrey Grodzovsky wrote:
What about changing PF_SIGNALED to PF_EXITING in
drm_sched_entity_do_release

-ÂÂÂÂÂÂ if ((current->flags & PF_SIGNALED) && current->exit_code == SIGKILL)
+ÂÂÂÂÂ if ((current->flags & PF_EXITING) && current->exit_code == SIGKILL)
let me repeat, please don't use task->exit_code. And in fact this check is racy

But this doesn't matter. Say, we can trivially add SIGNAL_GROUP_KILLED_BY_SIGKILL,
or do something else,


Can you explain where is the race and what is a possible alternative then ?

but I fail to understand what are you trying to do. Suppose
that the check above is correct in that it is true iff the task is exiting and
it was killed by SIGKILL. What about the "else" branch which does

r = wait_event_killable(sched->job_scheduled, ...)

?

Once again, fatal_signal_pending() (or even signal_pending()) is not well defined
after the exiting task passes exit_signals().

So wait_event_killable() can fail because fatal_signal_pending() is true; and this
can happen even if it was not killed.

Or it can block and SIGKILL won't be able to wake it up.

If SIGINT was sent then it's SIGINT,
Yes, but see above. in this case fatal_signal_pending() will be likely true so
wait_event_killable() will fail unless condition is already true.

My bad, I didn't show the full intended fix, it was just a snippet to address the differentiation between exiting
do to SIGKILL and any other exit, I also intended to change wait_event_killable to wait_event_timeout.

Andrey


Oleg.