Re: [PATCH v3] fs/proc: Expose RSEQ configuration

From: Piotr Figiel
Date: Wed Jan 27 2021 - 10:21:07 EST


On Tue, Jan 26, 2021 at 03:58:46PM -0500, Mathieu Desnoyers wrote:
> ----- On Jan 26, 2021, at 1:54 PM, Piotr Figiel figiel@xxxxxxxxxx wrote:
> [...]
> > diff --git a/kernel/rseq.c b/kernel/rseq.c
> > index a4f86a9d6937..6aea67878065 100644
> > --- a/kernel/rseq.c
> > +++ b/kernel/rseq.c
> > @@ -322,8 +322,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32,
> > rseq_len,
> > ret = rseq_reset_rseq_cpu_id(current);
> > if (ret)
> > return ret;
> > + task_lock(current);
> > current->rseq = NULL;
> > current->rseq_sig = 0;
> > + task_unlock(current);
> > return 0;
> > }
> >
> > @@ -353,8 +355,10 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32,
> > rseq_len,
> > return -EINVAL;
> > if (!access_ok(rseq, rseq_len))
> > return -EFAULT;
> > + task_lock(current);
> > current->rseq = rseq;
> > current->rseq_sig = sig;
> > + task_unlock(current);
>
> So AFAIU, the locks are there to make sure that whenever a user-space
> thread reads that state through that new /proc file ABI, it observes
> coherent "rseq" vs "rseq_sig" values.

Yes, that was the intention.

> However, I'm not convinced this is the right approach to consistency
> here.
>
> Because if you add locking as done here, you ensure that the /proc
> file reader sees coherent values, but between the point where those
> values are read from kernel-space, copied to user-space, and then
> acted upon by user-space, those can very well have become outdated if
> the observed process runs concurrently.

You are right here, but I think this comment is valid for most of the
process information exported via procfs. The user can almost always make
a time of check/time of use race when interacting with procfs. I agree
that the locking added in v3 doesn't help much, but at least it does
provide a well defined answer: i.e. at least in some point of time the
effective configuration was as returned.
It makes it a bit easier to document and reason about the file contents,
compared to the inconsistent version.

> So my understanding here is that the only non-racy way to effectively
> use those values is to either read them from /proc/self/* (from the
> thread owning the task struct), or to ensure that the thread is
> stopped/frozen while the read is done.

Constraining this solely to the owning thread I think is a bit too
limiting. I think we could limit it to stopped threads but I don't think
it eliminates the potential of time of check/time of use races for the
user. In this shape as in v3 - it's up to the user to decide if there
is a relevant risk of a race, if it's unwanted then the thread can be
stopped with e.g. ptrace, cgroup freeze or SIGSTOP.

Best regards,
Piotr.