Re: [PATCH RFC] sched: Add a per-thread core scheduling interface

From: Phil Auld
Date: Thu May 28 2020 - 14:17:36 EST

Next message: Sedat Dilek: "Re: [PATCHSET v5 0/12] Add support for async buffered reads"
Previous message: David Miller: "Re: remove most callers of kernel_setsockopt v3"
In reply to: Peter Zijlstra: "Re: [PATCH RFC] sched: Add a per-thread core scheduling interface"
Next in thread: Phil Auld: "Re: [PATCH RFC] sched: Add a per-thread core scheduling interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, May 28, 2020 at 07:01:28PM +0200 Peter Zijlstra wrote:
> On Sun, May 24, 2020 at 10:00:46AM -0400, Phil Auld wrote:
> > On Fri, May 22, 2020 at 05:35:24PM -0400 Joel Fernandes wrote:
> > > On Fri, May 22, 2020 at 02:59:05PM +0200, Peter Zijlstra wrote:
> > > [..]
> > > > > > It doens't allow tasks for form their own groups (by for example setting
> > > > > > the key to that of another task).
> > > > >
> > > > > So for this, I was thinking of making the prctl pass in an integer. And 0
> > > > > would mean untagged. Does that sound good to you?
> > > >
> > > > A TID, I think. If you pass your own TID, you tag yourself as
> > > > not-sharing. If you tag yourself with another tasks's TID, you can do
> > > > ptrace tests to see if you're allowed to observe their junk.
> > >
> > > But that would require a bunch of tasks agreeing on which TID to tag with.
> > > For example, if 2 tasks tag with each other's TID, then they would have
> > > different tags and not share.
>
> Well, don't do that then ;-)
>

That was a poorly worded example :)

The point I was trying to make was more that one TID of a group (not cgroup!)
of tasks is just an arbitrary value.

At a single process (or pair rather) level, sure, you can see it as an
identifier of whom you want to share with, but even then you have to tag
both processes with this. And it has less meaning when the whom you want to
share with is mutltiple tasks.

> > > What's wrong with passing in an integer instead? In any case, we would do the
> > > CAP_SYS_ADMIN check to limit who can do it.
>
> So the actual permission model can be different depending on how broken
> the hardware is.
>
> > > Also, one thing CGroup interface allows is an external process to set the
> > > cookie, so I am wondering if we should use sched_setattr(2) instead of, or in
> > > addition to, the prctl(2). That way, we can drop the CGroup interface
> > > completely. How do you feel about that?
> > >
> >
> > I think it should be an arbitrary 64bit value, in both interfaces to avoid
> > any potential reuse security issues.
> >
> > I think the cgroup interface could be extended not to be a boolean but take
> > the value. With 0 being untagged as now.
>
> How do you avoid reuse in such a huge space? That just creates yet
> another problem for the kernel to keep track of who is who.
>

The kernel doesn't care or have to track anything. The admin does.
At the kernel level it's just matching cookies.

Tasks A,B,C all can share core so you give them each A's TID as a cookie.
Task A then exits. Now B and C are using essentially a random value.
Task D comes along and want to share with B and C. You have to tag it
with A's old TID, which has no meaning at this point.

And if A's TID ever gets reused. The new A` gets to share too. At some
level aren't those still 32bits?

> With random u64 numbers, it even becomes hard to determine if you're
> sharing at all or not.
>
> Now, with the current SMT+MDS trainwreck, any sharing is bad because it
> allows leaking kernel privates. But under a less severe thread scenario,
> say where only user data would be at risk, the ptrace() tests make
> sense, but those become really hard with random u64 numbers too.
>
> What would the purpose of random u64 values be for cgroups? That only
> replicates the problem of determining uniqueness there. Then you can get
> two cgroups unintentionally sharing because you got lucky.
>

Seems that would be more flexible for the admin.

What if you had two cgroups you wanted to allow to run together? Or a
cgroup and a few processes from a different one (say with different
quotas or something).

I don't have such use cases so I don't feel that strongly but it seemed
more flexible and followed the mechanism-in-kernel/policy-in-userspace
dictum rather than basing the functionality on the implementation details.

Cheers,
Phil

> Also, fundamentally, we cannot have more threads than TID space, it's a
> natural identifier.
>

--

Next message: Sedat Dilek: "Re: [PATCHSET v5 0/12] Add support for async buffered reads"
Previous message: David Miller: "Re: remove most callers of kernel_setsockopt v3"
In reply to: Peter Zijlstra: "Re: [PATCH RFC] sched: Add a per-thread core scheduling interface"
Next in thread: Phil Auld: "Re: [PATCH RFC] sched: Add a per-thread core scheduling interface"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]