Re: [PATCH RFC] sched: Add a per-thread core scheduling interface

From: Joel Fernandes
Date: Thu May 28 2020 - 14:23:31 EST


Hi Peter,

On Thu, May 28, 2020 at 07:01:28PM +0200, Peter Zijlstra wrote:
> On Sun, May 24, 2020 at 10:00:46AM -0400, Phil Auld wrote:
> > On Fri, May 22, 2020 at 05:35:24PM -0400 Joel Fernandes wrote:
> > > On Fri, May 22, 2020 at 02:59:05PM +0200, Peter Zijlstra wrote:
> > > [..]
> > > > > > It doens't allow tasks for form their own groups (by for example setting
> > > > > > the key to that of another task).
> > > > >
> > > > > So for this, I was thinking of making the prctl pass in an integer. And 0
> > > > > would mean untagged. Does that sound good to you?
> > > >
> > > > A TID, I think. If you pass your own TID, you tag yourself as
> > > > not-sharing. If you tag yourself with another tasks's TID, you can do
> > > > ptrace tests to see if you're allowed to observe their junk.
> > >
> > > But that would require a bunch of tasks agreeing on which TID to tag with.
> > > For example, if 2 tasks tag with each other's TID, then they would have
> > > different tags and not share.
>
> Well, don't do that then ;-)

We could also guard it with a mutex. First task to set the TID wins, the
other thread just reuses the cookie of the TID that won.

But I think we cannot just use the TID value as the cookie, due to TID
wrap-around and reuse. Otherwise we could accidentally group 2 tasks. Instead, I
suggest let us keep TID as the interface per your suggestion and do the
needed ptrace checks, but convert the TID to the task_struct pointer value
and use that as the cookie for the group of tasks sharing a core.

Thoughts?

thanks,

- Joel

> > > What's wrong with passing in an integer instead? In any case, we would do the
> > > CAP_SYS_ADMIN check to limit who can do it.
>
> So the actual permission model can be different depending on how broken
> the hardware is.
>
> > > Also, one thing CGroup interface allows is an external process to set the
> > > cookie, so I am wondering if we should use sched_setattr(2) instead of, or in
> > > addition to, the prctl(2). That way, we can drop the CGroup interface
> > > completely. How do you feel about that?
> > >
> >
> > I think it should be an arbitrary 64bit value, in both interfaces to avoid
> > any potential reuse security issues.
> >
> > I think the cgroup interface could be extended not to be a boolean but take
> > the value. With 0 being untagged as now.
>
> How do you avoid reuse in such a huge space? That just creates yet
> another problem for the kernel to keep track of who is who.
>
> With random u64 numbers, it even becomes hard to determine if you're
> sharing at all or not.
>
> Now, with the current SMT+MDS trainwreck, any sharing is bad because it
> allows leaking kernel privates. But under a less severe thread scenario,
> say where only user data would be at risk, the ptrace() tests make
> sense, but those become really hard with random u64 numbers too.
>
> What would the purpose of random u64 values be for cgroups? That only
> replicates the problem of determining uniqueness there. Then you can get
> two cgroups unintentionally sharing because you got lucky.
>
> Also, fundamentally, we cannot have more threads than TID space, it's a
> natural identifier.