Re: [PATCH RFC v2] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals

From: Jens Axboe
Date: Fri Oct 02 2020 - 09:44:57 EST


On 10/2/20 7:38 AM, Hillf Danton wrote:
>
> On Thu, 1 Oct 2020 11:27:04 -0600 Jens Axboe wrote:
>> On 10/1/20 10:27 AM, Oleg Nesterov wrote:
>>> Jens,
>>>
>>> I'll read this version tomorrow, but:
>>>
>>> On 10/01, Jens Axboe wrote:
>>>>
>>>> static inline int signal_pending(struct task_struct *p)
>>>> {
>>>> - return unlikely(test_tsk_thread_flag(p,TIF_SIGPENDING));
>>>> +#ifdef TIF_TASKWORK
>>>> + /*
>>>> + * TIF_TASKWORK isn't really a signal, but it requires the same
>>>> + * behavior of restarting the system call to force a kernel/user
>>>> + * transition.
>>>> + */
>>>> + return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING) ||
>>>> + test_tsk_thread_flag(p, TIF_TASKWORK));
>>>> +#else
>>>> + return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>>>> +#endif
>>>
>>> This change alone is already very wrong.
>>>
>>> signal_pending(task) == T means that this task will do get_signal() as
>>> soon as it can, and this basically means you can't "divorce" SIGPENDING
>>> and TASKWORK.
>>>
>>> Simple example. Suppose we have a single-threaded task T.
>>>
>>> Someone does task_work_add(T, TWA_SIGNAL). This makes signal_pending()==T
>>> and this is what we need.
>>>
>>> Now suppose that another task sends a signal to T before T calls
>>> task_work_run() and clears TIF_TASKWORK. In this case SIGPENDING won't
>>> be set because signal_pending() is already set (see wants_signal), and
>>> this means that T won't notice this signal.
>>
>> That's a good point, and I have been thinking along those lines. The
>> "problem" is the two different use cases:
>>
>> 1) The "should I return from schedule() or break out of schedule() loops
>> kind of use cases".
>>
>> 2) Internal signal delivery use cases.
>>
>> The former wants one that factors in TIF_TASKWORK, while the latter
>> should of course only look at TIF_SIGPENDING.
>>
>> Now, my gut reaction would be to have __signal_pending() that purely
>> checks for TIF_SIGPENDING, and make sure we use that on the signal
>> delivery side of things. Or something with a better name than that, but
>> functionally the same. Ala:
>>
>> static inline int __signal_pending(struct task_struct *p)
>> {
>> return unlikely(test_tsk_thread_flag(p, TIF_SIGPENDING));
>> }
>>
>> static inline int signal_pending(struct task_struct *p)
>> {
>> #ifdef TIF_TASKWORK
>> return unlikely(test_tsk_thread_flag(p, TIF_TASKWORK)||
>> __signal_pending(p));
>> #else
>> return __signal_pending(p));
>> #endif
>> }
>>
>> and then use __signal_pending() on the signal delivery side.
>>
>> It's still not great in the sense that renaming signal_pending() would
>> be a better choice, but that's a whole lot of churn...
>
> To avoid that churn, IIUC replace TWA_SIGNAL with TWA_RESUME on
> adding task work, which is compensated by adding a counter of
> event source in IO ctx and waiting for event to arrive instead
> of signal.

That doesn't work. If the task is waiting in cqring_wait(), then
there's no issue already. The problem is if it's waiting somewhere
else.

Imagine three threads, call them T1-3. T1 creates a pipe, and creates
a ring. T1 queues a poll request for the read end of the pipe, and now
does a wait for T2. T2 is a completer thread, so it ends up waiting
for events on the ring. T2 is now in cqring_wait(). T3 is created,
and it writes to the pipe. This write triggers the original poll
request from T1, and task_work is now queued for T1. This task work
needs to be processed for T2 to wakeup and complete, but it can't
since T1 is in pthread_join() for T2.

This is why TWA_SIGNAL is needed, we need it to break the T1 wait
loop and process this work. No amount of changes in io_uring can
fix this dependency, and if you look at the last series posted,
it does in fact not even have any io_uring changes.

Hence the goal is to have TWA_SIGNAL have the same kind of semantics
it does now, but decoupled from ->sighand since that is problematic
on particularly threaded setups.

--
Jens Axboe