Re: [RFC PATCH 1/8] signal: Dequeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is set

From: Christian Brauner
Date: Thu May 18 2023 - 04:08:19 EST


On Wed, May 17, 2023 at 07:09:13PM -0500, Mike Christie wrote:
> This has us deqeue SIGKILL even if SIGNAL_GROUP_EXIT/group_exec_task is
> set when we are dealing with PF_USER_WORKER tasks.
>
> When a vhost_task gets a SIGKILL, we could have outstanding IO in flight.
> We can easily stop new work/IO from being queued to the vhost_task, but
> for IO that's already been sent to something like the block layer we
> need to wait for the response then process it. These type of IO
> completions use the vhost_task to process the completion so we can't
> exit immediately.
>
> We need to handle wait for then handle those completions from the
> vhost_task, but when we have a SIGKLL pending, functions like
> schedule() return immediately so we can't wait like normal. Functions
> like vhost_worker() degrade to just a while(1); loop.
>
> This patch has get_signal drop down to the normal code path when
> SIGNAL_GROUP_EXIT/group_exec_task is set so the caller can still detect
> there is a SIGKILL but still perform some blocking cleanup.
>
> Note that in that chunk I'm now bypassing that does:
>
> sigdelset(&current->pending.signal, SIGKILL);
>
> we look to be ok, because in the places we set SIGNAL_GROUP_EXIT/
> group_exec_task we are already doing that on the threads in the
> group.
>
> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
> ---

I think you just got confused by the original discussion that was split
into two separate threads:

(1) The discussion based on your original proposal to adjust the signal
handling logic to accommodate vhost workers as they are right now.
That's where Oleg jumped in.
(2) My request - which you did in this series - of rewriting vhost
workers to behave more like io_uring workers.

Both problems are orthogonal. The gist of my proposal is to avoid (1) by
doing (2). So the only change that's needed is
s/PF_IO_WORKER/PF_USER_WORKER/ which is pretty obvious as io_uring
workers and vhost workers no almost fully collapse into the same
concept.

So forget (1). If additional signal patches are needed as discussed in
(1) then it must be because of a bug that would affect io_uring workers
today.