Re: [RFC PATCH v0.1 4/9] sched/umcg: implement core UMCG API
From: Peter Zijlstra
Date: Wed Jun 09 2021 - 09:02:34 EST
On Fri, May 21, 2021 at 11:33:14PM +0200, Jann Horn wrote:
> > SYSCALL_DEFINE2(umcg_wake, u32, flags, u32, next_tid)
> > {
> > - return -ENOSYS;
> > + struct umcg_task_data *next_utd;
> > + struct task_struct *next;
> > + int ret = -EINVAL;
> > +
> > + if (!next_tid)
> > + return -EINVAL;
> > + if (flags)
> > + return -EINVAL;
> > +
> > + next = find_get_task_by_vpid(next_tid);
> > + if (!next)
> > + return -ESRCH;
> > + rcu_read_lock();
>
> Wouldn't it be more efficient to replace the last 4 lines with the following?
>
> rcu_read_lock();
> next = find_task_by_vpid(next_tid);
> if (!next) {
> err = -ESRCH;
> goto out;
> }
This wakeup crud needs to modify the umcg->state, which is a user
variable. That can't be done under RCU. Weirdly the proposed code
doesn't actually do any of that for undocumented raisins :/
> Then you don't need to use refcounting here...
>
> > + next_utd = rcu_dereference(next->umcg_task_data);
> > + if (!next_utd)
> > + goto out;
> > +
> > + if (!READ_ONCE(next_utd->in_wait)) {
> > + ret = -EAGAIN;
> > + goto out;
> > + }
> > +
> > + ret = wake_up_process(next);
> > + put_task_struct(next);
>
> ... and you'd be able to drop this put_task_struct(), too.
>
> > + if (ret)
> > + ret = 0;
> > + else
> > + ret = -EAGAIN;
> > +
> > +out:
> > + rcu_read_unlock();
> > + return ret;
> > }
> >
> > /**
> > @@ -139,5 +325,44 @@ SYSCALL_DEFINE2(umcg_wake, u32, flags, u32, next_tid)
> > SYSCALL_DEFINE4(umcg_swap, u32, wake_flags, u32, next_tid, u32, wait_flags,
> > const struct __kernel_timespec __user *, timeout)
> > {
> > - return -ENOSYS;
> > + struct umcg_task_data *curr_utd;
> > + struct umcg_task_data *next_utd;
> > + struct task_struct *next;
> > + int ret = -EINVAL;
> > +
> > + rcu_read_lock();
> > + curr_utd = rcu_dereference(current->umcg_task_data);
> > +
> > + if (!next_tid || wake_flags || wait_flags || !curr_utd)
> > + goto out;
> > +
> > + if (timeout) {
> > + ret = -EOPNOTSUPP;
> > + goto out;
> > + }
> > +
> > + next = find_get_task_by_vpid(next_tid);
> > + if (!next) {
> > + ret = -ESRCH;
> > + goto out;
> > + }
>
> There isn't any type of access check here, right? Any task can wake up
> any other task? That feels a bit weird to me - and if you want to keep
> it as-is, it should probably at least be documented that any task on
> the system can send you spurious wakeups if you opt in to umcg.
You can only send wakeups to other UMCG thingies, per the
next->umcg_task_data check below. That said..
> In contrast, shared futexes can avoid this because they get their
> access control implicitly from the VMA.
Every task must expect spurious wakups at all times, always (for
TASK_NORMAL wakeups that is). There's plenty ways to generate them.
> > + next_utd = rcu_dereference(next->umcg_task_data);
> > + if (!next_utd) {
> > + ret = -EINVAL;
> > + goto out;
> > + }