Re: SCHED_DEADLINE with CPU affinity
From: Juri Lelli
Date: Mon Jan 13 2020 - 04:22:38 EST
Hi,
Sorry for the delay in repling (Xmas + catching-up w/ emails).
On 24/12/19 11:03, Philipp Stanner wrote:
> On Wed, 20.11.2019, 09:50 +0100 Juri Lelli wrote:
> > Hi Philipp,
>
> Hey Juri,
>
> thanks so far; we indeed could make it work with exclusive CPU-sets.
Good. :-)
> On 19/11/19 23:20, Philipp Stanner wrote:
> >
> > > from implementing our intended architecture.
> > >
> > > Now, the questions we're having are:
> > >
> > > 1. Why does the kernel do this, what is the problem with
> > > scheduling with
> > > SCHED_DEADLINE on a certain core? In contrast, how is it
> > > handled when
> > > you have single core systems etc.? Why this artificial
> > > limitation?
> >
> > Please have also a look (you only mentioned manpage so, in case you
> > missed it) at
> >
> > https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
> >
> > and the document in general should hopefully give you the answer
> > about
> > why we need admission control and current limitations regarding
> > affinities.
> >
> > > 2. How can we possibly implement this? We don't want to use
> > > SCHED_FIFO,
> > > because out-of-control tasks would freeze the entire
> > > container.
> >
> > I experimented myself a bit with this kind of setup in the past and I
> > think I made it work by pre-configuring exclusive cpusets (similarly
> > as
> > what detailed in the doc above) and then starting containers inside
> > such
> > exclusive sets with podman run --cgroup-parent option.
> >
> > I don't have proper instructions yet for how to do this (plan to put
> > them together soon-ish), but please see if you can make it work with
> > this hint.
>
> I fear I have not understood quite well yet why this
> "workaround" leads to (presumably) the same results as set_affinity
> would. From what I have read, I understand it as follows: For
> sched_dead, admission control tries to guarantee that the requested
> policy can be executed. To do so, it analyzes the current workload
> situation, taking especially the number of cores into account.
>
> Now, with a pre-configured set, the kernel knows which tasks will run
> on which core, therefore it's able to judge wether a process can be
> deadline scheduled or not. But when using the default way, you could
> start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
> later many of them could suddenly call set_affinity, desiring to run on
> the same core, therefore provoking collisions.
But setting affinity would still have to pass admission control, and
should fail in the case you are describing (IIUC).
https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5433
Best,
Juri