On Wed, Nov 27, 2019 at 9:48 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 11/27/19 12:23 PM, Jann Horn wrote:
On Wed, Nov 27, 2019 at 6:11 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
I posted this a few weeks back, took another look at it and refined it a
bit. I'd like some input on the viability of this approach.
A new signalfd setup flag is added, SFD_TASK. This is only valid if used
with SFD_CLOEXEC. If set, the task setting up the signalfd descriptor is
remembered in the signalfd context, and will be the one we use for
checking signals in the poll/read handlers in signalfd.
This is needed to make signalfd useful with io_uring and aio, of which
the former in particular has my interest.
I _think_ this is sane. To prevent the case of a task clearing O_CLOEXEC
on the signalfd descriptor, forking, and then exiting, we grab a
reference to the task when we assign it. If that original task exits, we
catch it in signalfd_flush() and ensure waiters are woken up.
Mh... that's not really reliable, because you only get ->flush() from
the last exiting thread (or more precisely, the last exiting task that
shares the files_struct).
What is your goal here? To have a reference to a task without keeping
the entire task_struct around in memory if someone leaks the signalfd
to another process - basically like a weak pointer? If so, you could
store a refcounted reference to "struct pid" instead of a refcounted
reference to the task_struct, and then do the lookup of the
task_struct on ->poll and ->read (similar to what procfs does).
Yeah, I think that works out much better (and cleaner). How about this,
then? Follows your advice and turns it into a struct pid instead. I
don't particularly like the -ESRCH in dequeue and setup, what do you
think? For poll, POLLERR seems like a prudent choice.
-ESRCH may be kinda weird, but I also can't think of anything
better... and it does describe the problem pretty accurately: The task
whose signal state you're trying to inspect is gone. I went through
the list of errnos, and everything else sounded more weird...
One more thing, though: We'll have to figure out some way to
invalidate the fd when the target goes through execve(), in particular
if it's a setuid execution. Otherwise we'll be able to just steal
signals that were intended for the other task, that's probably not
good.
So we should:
a) prevent using ->wait() on an old signalfd once the task has gone
through execve()
b) kick off all existing waiters
c) most importantly, prevent ->read() on an old signalfd once the
task has gone through execve()
We probably want to avoid using the cred_guard_mutex here, since it is
quite broad and has some deadlocking issues; it might make sense to
put the update of ->self_exec_id in fs/exec.c under something like the
siglock, and then for a) and c) we can check whether the
->self_exec_id changed while holding the siglock, and for b) we can
add a call to signalfd_cleanup() after the ->self_exec_id change.
+static void signalfd_put_task(struct signalfd_ctx *ctx, struct task_struct *tsk)
+{
+ if (ctx->task_pid)
+ put_task_struct(tsk);
+}
+
+static struct task_struct *signalfd_get_task(struct signalfd_ctx *ctx)
+{
+ if (ctx->task_pid)
+ return get_pid_task(ctx->task_pid, PIDTYPE_PID);
+
+ return current;
+}
This works, and I guess it's a question of coding style... but I'd
kinda prefer to do the refcount operation in both cases, so that the
semantics of the returned reference are simply "holds a reference"
instead of "either holds a reference or borrows from current depending
on ctx->task_pid". But if you feel strongly about it, feel free to
keep it as-is.
- add_wait_queue(¤t->sighand->signalfd_wqh, &wait);
+ add_wait_queue(&tsk->sighand->signalfd_wqh, &wait);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
- ret = dequeue_signal(current, &ctx->sigmask, info);
+ ret = dequeue_signal(tsk, &ctx->sigmask, info);
if (ret != 0)
break;
if (signal_pending(current)) {
ret = -ERESTARTSYS;
break;
}
- spin_unlock_irq(¤t->sighand->siglock);
+ spin_unlock_irq(&tsk->sighand->siglock);
schedule();
Should we be dropping the reference to the task before schedule() and
re-acquiring it afterwards so that if we're blocked on a signalfd read
and then the corresponding task dies, the refcount can drop to zero
and we can get woken up? Probably doesn't matter, but seems a bit
cleaner to me.
- spin_lock_irq(¤t->sighand->siglock);
+ spin_lock_irq(&tsk->sighand->siglock);
}
- spin_unlock_irq(¤t->sighand->siglock);
+ spin_unlock_irq(&tsk->sighand->siglock);
- remove_wait_queue(¤t->sighand->signalfd_wqh, &wait);
+ remove_wait_queue(&tsk->sighand->signalfd_wqh, &wait);
__set_current_state(TASK_RUNNING);
+ signalfd_put_task(ctx, tsk);
return ret;
}
@@ -267,19 +296,24 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
/* Check the SFD_* constants for consistency. */
BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
+ BUILD_BUG_ON(SFD_TASK & (SFD_CLOEXEC | SFD_NONBLOCK));
- if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
+ if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_TASK))
+ return -EINVAL;
+ if ((flags & (SFD_CLOEXEC | SFD_TASK)) == SFD_TASK)
return -EINVAL;
(non-actionable comment: It seems kinda weird that you can specify
these parameters with no effect for the `uffd != -1` case... but since
the existing parameters already work that way, I guess it's
consistent.)