Re: [PATCH RFC] kernel: decouple TASK_WORK TWA_SIGNAL handling from signals

From: Thomas Gleixner
Date: Thu Oct 01 2020 - 11:19:15 EST


On Thu, Oct 01 2020 at 08:29, Jens Axboe wrote:
> This adds TIF_TASKWORK for x86, which if set, will return true on
> checking for pending signals. That in turn causes tasks to restart the
> system call, which will run the added task_work.

Huch? The syscall restart does not cause the task work to run.

> If TIF_TASKWORK is available, we'll use that for notification when
> TWA_SIGNAL is specified. If it isn't available, the existing
> TIF_SIGPENDING path is used.

Bah. Yet another TIF flag just because.

> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1767,7 +1767,7 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb,
> notify = TWA_SIGNAL;
>
> ret = task_work_add(tsk, cb, notify);
> - if (!ret)
> + if (!ret && !notify)

!notify assumes that TWA_RESUME == 0. Fun to debug if that ever changes.

> wake_up_process(tsk);
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -28,7 +28,6 @@ int
> task_work_add(struct task_struct *task, struct callback_head *work, int notify)
> {
> struct callback_head *head;
> - unsigned long flags;
>
> do {
> head = READ_ONCE(task->task_works);
> @@ -41,7 +40,10 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
> case TWA_RESUME:
> set_notify_resume(task);
> break;
> - case TWA_SIGNAL:
> + case TWA_SIGNAL: {
> +#ifndef TIF_TASKWORK
> + unsigned long flags;
> +
> /*
> * Only grab the sighand lock if we don't already have some
> * task_work pending. This pairs with the smp_store_mb()
> @@ -53,7 +55,12 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
> signal_wake_up(task, 0);
> unlock_task_sighand(task, &flags);
> }
> +#else
> + set_tsk_thread_flag(task, TIF_TASKWORK);
> + wake_up_process(task);
> +#endif

This is really a hack. TWA_SIGNAL is a misnomer with the new
functionality and combined with the above

if (!ret && !notify)
wake_up_process(tsk);

there is not really a big difference between TWA_RESUME and TWA_SIGNAL
anymore. Just the delivery mode and the syscall restart magic.

> static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> unsigned long ti_work)
> {
> + bool restart_sys = false;
> +
> /*
> * Before returning to user space ensure that all pending work
> * items have been completed.
> @@ -157,8 +159,13 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> if (ti_work & _TIF_PATCH_PENDING)
> klp_update_patch_state(current);
>
> + if (ti_work & _TIF_TASKWORK) {
> + task_work_run();
> + restart_sys = true;
> + }
> +
> if (ti_work & _TIF_SIGPENDING)
> - arch_do_signal(regs);
> + restart_sys |= !arch_do_signal(regs);
>
> if (ti_work & _TIF_NOTIFY_RESUME) {
> clear_thread_flag(TIF_NOTIFY_RESUME);
> @@ -178,6 +185,9 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> ti_work = READ_ONCE(current_thread_info()->flags);
> }
>
> + if (restart_sys)
> + arch_restart_syscall(regs);
> +

How is that supposed to work?

Assume that both TIF_TASKWORK and TIF_SIGPENDING are set, i.e. after
running task work and requesting syscall restart there is an actual
signal to be delivered.

This needs a lot more thoughts.

Thanks,

tglx