Re: [PATCH v6 13/21] sched: Admit forcefully-affined tasks into SCHED_DEADLINE

From: Juri Lelli
Date: Fri May 21 2021 - 04:39:44 EST


On 21/05/21 08:15, Quentin Perret wrote:
> On Friday 21 May 2021 at 07:25:51 (+0200), Juri Lelli wrote:
> > On 20/05/21 19:01, Will Deacon wrote:
> > > On Thu, May 20, 2021 at 02:38:55PM +0200, Daniel Bristot de Oliveira wrote:
> > > > On 5/20/21 12:33 PM, Quentin Perret wrote:
> > > > > On Thursday 20 May 2021 at 11:16:41 (+0100), Will Deacon wrote:
> > > > >> Ok, thanks for the insight. In which case, I'll go with what we discussed:
> > > > >> require admission control to be disabled for sched_setattr() but allow
> > > > >> execve() to a 32-bit task from a 64-bit deadline task with a warning (this
> > > > >> is probably similar to CPU hotplug?).
> > > > >
> > > > > Still not sure that we can let execve go through ... It will break AC
> > > > > all the same, so it should probably fail as well if AC is on IMO
> > > > >
> > > >
> > > > If the cpumask of the 32-bit task is != of the 64-bit task that is executing it,
> > > > the admission control needs to be re-executed, and it could fail. So I see this
> > > > operation equivalent to sched_setaffinity(). This will likely be true for future
> > > > schedulers that will allow arbitrary affinities (AC should run on affinity
> > > > change, and could fail).
> > > >
> > > > I would vote with Juri: "I'd go with fail hard if AC is on, let it
> > > > pass if AC is off (supposedly the user knows what to do)," (also hope nobody
> > > > complains until we add better support for affinity, and use this as a motivation
> > > > to get back on this front).
> > >
> > > I can have a go at implementing it, but I don't think it's a great solution
> > > and here's why:
> > >
> > > Failing an execve() is _very_ likely to be fatal to the application. It's
> > > also very likely that the task calling execve() doesn't know whether the
> > > program it's trying to execute is 32-bit or not. Consequently, if we go
> > > with failing execve() then all that will happen is that people will disable
> > > admission control altogether.
>
> Right, but only on these dumb 32bit asymmetric systems, and only if we
> care about running 32bits deadline tasks -- which I seriously doubt for
> the Android use-case.
>
> Note that running deadline tasks is also a privileged operation, it
> can't be done by random apps.
>
> > > That has a negative impact on "pure" 64-bit
> > > applications and so I think we end up with the tail wagging the dog because
> > > admission control will be disabled for everybody just because there is a
> > > handful of 32-bit programs which may get executed. I understand that it
> > > also means that RT throttling would be disabled.
> >
> > Completely understand your perplexity. But how can the kernel still give
> > guarantees to "pure" 64-bit applications if there are 32-bit
> > applications around that essentially broke admission control when they
> > were restricted to a subset of cores?
> >
> > > Allowing the execve() to continue with a warning is very similar to the
> > > case in which all the 64-bit CPUs are hot-unplugged at the point of
> > > execve(), and this is much closer to the illusion that this patch series
> > > intends to provide.
> >
> > So, for hotplug we currently have a check that would make hotplug
> > operations fail if removing a CPU would mean not enough bandwidth to run
> > the currently admitted set of DEADLINE tasks.
>
> Aha, wasn't aware. Any pointers to that check for my education?

Hotplug ends up calling dl_cpu_busy() (after the cpu being hotplugged out
got removed), IIRC. So, if that fails the operation in undone.

> > > So, personally speaking, I would prefer the behaviour where we refuse to
> > > admit 32-bit tasks vioa sched_set_attr() if the root domain contains
> > > 64-bit CPUs, but we _don't_ fail execve() of a 32-bit program from a
> > > 64-bit deadline task.
> >
> > OK, this is interesting and I guess a very valid alternative. That would
> > force users to create exclusive domains for 32-bit tasks, right?
>
> FWIW this is not practical at all for our use-cases, the implications of
> splitting the system in independent root-domains are way too important
> for us to be able to recommend that. Disabling AC, OTOH, sounds simple
> enough. The RT throttling part is the only 'worrying' part, but even
> that may not be the end of the world.

Note that RT throttling (SCHED_{FIFO,RR}) is not handled by DEADLINE
servers yet.

Best,
Juri