Re: [PATCH 01/23] sched: Provide sched_set_fifo()

From: Paul E. McKenney
Date: Wed Apr 22 2020 - 09:11:42 EST


On Wed, Apr 22, 2020 at 01:27:20PM +0200, Peter Zijlstra wrote:
> SCHED_FIFO (or any static priority scheduler) is a broken scheduler
> model; it is fundamentally incapable of resource management, the one
> thing an OS is actually supposed to do.
>
> It is impossible to compose static priority workloads. One cannot take
> two well designed and functional static priority workloads and mash
> them together and still expect them to work.
>
> Therefore it doesn't make sense to expose the priority field; the
> kernel is fundamentally incapable of setting a sensible value, it
> needs systems knowledge that it doesn't have.
>
> Take away sched_setschedule() / sched_setattr() from modules and
> replace them with:
>
> - sched_set_fifo(p); create a FIFO task (at prio 50)
> - sched_set_fifo_low(p); create a task higher than NORMAL,
> which ends up being a FIFO task at prio 1.
> - sched_set_normal(p, nice); (re)set the task to normal
>
> This stops the proliferation of randomly chosen, and irrelevant, FIFO
> priorities that dont't really mean anything anyway.
>
> The system administrator/integrator, whoever has insight into the
> actual system design and requirements (userspace) can set-up
> appropriate priorities if and when needed.

The sched_setscheduler_nocheck() calls in rcu_spawn_gp_kthread(),
rcu_cpu_kthread_setup(), and rcu_spawn_one_boost_kthread() all stay as
is because they all use the rcutree.kthread_prio boot parameter, which is
set at boot time by the system administrator (or {who,what}ever, correct?

Or did my email reader eat a patch or two?

Thanx, Paul

> Cc: airlied@xxxxxxxxxx
> Cc: alexander.deucher@xxxxxxx
> Cc: awalls@xxxxxxxxxxxxxxxx
> Cc: axboe@xxxxxxxxx
> Cc: broonie@xxxxxxxxxx
> Cc: daniel.lezcano@xxxxxxxxxx
> Cc: gregkh@xxxxxxxxxxxxxxxxxxx
> Cc: hannes@xxxxxxxxxxx
> Cc: herbert@xxxxxxxxxxxxxxxxxxx
> Cc: hverkuil@xxxxxxxxx
> Cc: john.stultz@xxxxxxxxxx
> Cc: nico@xxxxxxxxxxx
> Cc: paulmck@xxxxxxxxxx
> Cc: rafael.j.wysocki@xxxxxxxxx
> Cc: rmk+kernel@xxxxxxxxxxxxxxxx
> Cc: sudeep.holla@xxxxxxx
> Cc: tglx@xxxxxxxxxxxxx
> Cc: ulf.hansson@xxxxxxxxxx
> Cc: wim@xxxxxxxxxxxxxxxxxx
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>
> ---
> include/linux/sched.h | 3 +++
> kernel/sched/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1631,6 +1631,9 @@ extern int idle_cpu(int cpu);
> extern int available_idle_cpu(int cpu);
> extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
> extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
> +extern int sched_set_fifo(struct task_struct *p);
> +extern int sched_set_fifo_low(struct task_struct *p);
> +extern int sched_set_normal(struct task_struct *p, int nice);
> extern int sched_setattr(struct task_struct *, const struct sched_attr *);
> extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
> extern struct task_struct *idle_task(int cpu);
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5055,6 +5055,8 @@ static int _sched_setscheduler(struct ta
> * @policy: new policy.
> * @param: structure containing the new RT priority.
> *
> + * Use sched_set_fifo(), read its comment.
> + *
> * Return: 0 on success. An error code otherwise.
> *
> * NOTE that the task may be already dead.
> @@ -5097,6 +5099,51 @@ int sched_setscheduler_nocheck(struct ta
> }
> EXPORT_SYMBOL_GPL(sched_setscheduler_nocheck);
>
> +/*
> + * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally
> + * incapable of resource management, which is the one thing an OS really should
> + * be doing.
> + *
> + * This is of course the reason it is limited to privileged users only.
> + *
> + * Worse still; it is fundamentally impossible to compose static priority
> + * workloads. You cannot take two correctly working static prio workloads
> + * and smash them together and still expect them to work.
> + *
> + * For this reason 'all' FIFO tasks the kernel creates are basically at:
> + *
> + * MAX_RT_PRIO / 2
> + *
> + * The administrator _MUST_ configure the system, the kernel simply doesn't
> + * know enough information to make a sensible choice.
> + */
> +int sched_set_fifo(struct task_struct *p)
> +{
> + struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 };
> + return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo);
> +
> +/*
> + * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
> + */
> +int sched_set_fifo_low(struct task_struct *p)
> +{
> + struct sched_param sp = { .sched_priority = 1 };
> + return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo_low);
> +
> +int sched_set_normal(struct task_struct *p, int nice)
> +{
> + struct sched_attr attr = {
> + .sched_policy = SCHED_NORMAL,
> + .sched_nice = nice,
> + };
> + return sched_setattr_nocheck(p, &attr);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_normal);
> +
> static int
> do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
> {
>
>