Re: SCHED_DEADLINE with CPU affinity
From: Juri Lelli
Date: Wed Nov 20 2019 - 03:50:36 EST
Hi Philipp,
On 19/11/19 23:20, Philipp Stanner wrote:
> Hey folks,
> (please put me in CC when answering, I'm not subscribed)
>
> I'm currently working student in the embedded industry. We have a device where
> we need to be able to process network data within a certain deadline. At the
> same time, safety is a primary requirement; that's why we construct everything
> fully redundant. Meaning: We have two network interfaces, each IRQ then bound
> to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
> turn is bound to the corresponding CPU (CPU affinity masked).
>
> Container0 Container1
> ----------------- -----------------
> | | | |
> | Proc. A | | Proc. A' |
> | Proc. B | | Proc. B' |
> | | | |
> ----------------- -----------------
> ^ ^
> | |
> CPU 0 CPU 1
> | |
> IRQ eth0 IRQ eth1
>
>
> Within each container several processes are started. Ranging from systemd
> (SCHED_OTHER) till two (soft) real-time critical processes: which we want to
> execute via SCHED_DEADLINE.
>
> Now, I've worked through the manpage describing scheduling policies, and it
> seems that our scenario is forbidden my the kernel. I've done some tests with
> the syscalls sched_setattr and sched_setaffinity, trying to activate
> SCHED_DEADLINE while also binding to a certain core. It fails with EINVAL or
> EINBUSY, depending on the order of the syscalls.
>
> I've read that the kernel accomplishes plausibility checks when you ask for a
Yeah, admission control.
> new deadline task to be scheduled, and I assume this check is what prevents us
> from implementing our intended architecture.
>
> Now, the questions we're having are:
>
> 1. Why does the kernel do this, what is the problem with scheduling with
> SCHED_DEADLINE on a certain core? In contrast, how is it handled when
> you have single core systems etc.? Why this artificial limitation?
Please have also a look (you only mentioned manpage so, in case you
missed it) at
https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
and the document in general should hopefully give you the answer about
why we need admission control and current limitations regarding
affinities.
> 2. How can we possibly implement this? We don't want to use SCHED_FIFO,
> because out-of-control tasks would freeze the entire container.
I experimented myself a bit with this kind of setup in the past and I
think I made it work by pre-configuring exclusive cpusets (similarly as
what detailed in the doc above) and then starting containers inside such
exclusive sets with podman run --cgroup-parent option.
I don't have proper instructions yet for how to do this (plan to put
them together soon-ish), but please see if you can make it work with
this hint.
Best,
Juri