Re: [PATCH v2 2/2] sched/uclamp: Protect uclamp fast path code with static key

From: Patrick Bellasi
Date: Wed Jun 24 2020 - 03:34:16 EST



On Fri, Jun 19, 2020 at 19:20:11 +0200, Qais Yousef <qais.yousef@xxxxxxx> wrote...

[...]

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4265861e13e9..9ab22f699613 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -793,6 +793,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;
> /* All clamps are required to be less or equal than these values */
> static struct uclamp_se uclamp_default[UCLAMP_CNT];
>
> +/*
> + * This static key is used to reduce the uclamp overhead in the fast path. It
> + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task().
> + *
> + * This allows users to continue to enable uclamp in their kernel config with
> + * minimum uclamp overhead in the fast path.
> + *
> + * As soon as userspace modifies any of the uclamp knobs, the static key is
> + * disabled, since we have an actual users that make use of uclamp
> + * functionality.
> + *
> + * The knobs that would disable this static key are:
> + *
> + * * A task modifying its uclamp value with sched_setattr().
> + * * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> + * * An admin modifying the cgroup cpu.uclamp.{min, max}
> + */
> +static DEFINE_STATIC_KEY_TRUE(sched_uclamp_unused);

I would personally prefer a non negated semantic.

Why not using 'sched_uclamp_enabled'?

> +
> /* Integer rounded range for each bucket */
> #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS)
>
> @@ -993,9 +1012,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
> lockdep_assert_held(&rq->lock);
>
> bucket = &uc_rq->bucket[uc_se->bucket_id];
> - SCHED_WARN_ON(!bucket->tasks);
> - if (likely(bucket->tasks))
> - bucket->tasks--;
> +
> + /*
> + * This could happen if sched_uclamp_unused was disabled while the
> + * current task was running, hence we could end up with unbalanced call
> + * to uclamp_rq_dec_id().
> + */
> + if (unlikely(!bucket->tasks))
> + return;
> +
> + bucket->tasks--;

Since above you are not really changing the logic, why changing the
code?

The SCHED_WARN_ON/if(likely) is a defensive programming thing.
I understand that SCHED_WARN_ON() can now be misleading because of the
unbalanced calls but... why not just removing it?

Maybe also adding in the comment, but I don't see valid reasons to
change the code if the functionality is not changing.


> uc_se->active = false;
>
> /*
> @@ -1031,6 +1057,13 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
> {
> enum uclamp_id clamp_id;
>
> + /*
> + * Avoid any overhead until uclamp is actually used by the userspace.
> + * Including the potential JMP if we use static_branch_unlikely()

The comment above (about unlikely) seems not to match the code?

> + */
> + if (static_branch_likely(&sched_uclamp_unused))
> + return;

Moreover, something like:

if (static_key_false(&sched_uclamp_enabled))
return;

is not just good enough?

> +
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;

Since we already have these per sched_class gates, I'm wondering if it
could make sense to just re-purpose them.

Problem with the static key is that if just one RT task opts in, CFS
will still pay the overheads, and vice versa too.

So, an alternative approach could be to opt in sched classes on-demand.

The above if(unlikely) is not exactly has a static key true, but I
assume we agree the overheads we are tacking are nothing compared to
that check, aren't they?


> @@ -1046,6 +1079,13 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
> {
> enum uclamp_id clamp_id;
>
> + /*
> + * Avoid any overhead until uclamp is actually used by the userspace.
> + * Including the potential JMP if we use static_branch_unlikely()
> + */
> + if (static_branch_likely(&sched_uclamp_unused))
> + return;
> +
> if (unlikely(!p->sched_class->uclamp_enabled))
> return;
>
> @@ -1155,9 +1195,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> update_root_tg = true;
> }
>
> - if (update_root_tg)
> + if (update_root_tg) {
> uclamp_update_root_tg();
>
> + if (static_branch_unlikely(&sched_uclamp_unused))
> + static_branch_disable(&sched_uclamp_unused);
> + }
> +

Can we move the above into a function?

Something similar to set_schedstats(bool), what about uclamp_enable(bool)?

> /*
> * We update all RUNNABLE tasks only when task groups are in use.
> * Otherwise, keep it simple and do just a lazy update at each next
> @@ -1221,6 +1265,9 @@ static void __setscheduler_uclamp(struct task_struct *p,
> if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)))
> return;
>
> + if (static_branch_unlikely(&sched_uclamp_unused))
> + static_branch_disable(&sched_uclamp_unused);
> +
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
> uclamp_se_set(&p->uclamp_req[UCLAMP_MIN],
> attr->sched_util_min, true);
> @@ -7315,6 +7362,9 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf,
> if (req.ret)
> return req.ret;
>
> + if (static_branch_unlikely(&sched_uclamp_unused))
> + static_branch_disable(&sched_uclamp_unused);
> +
> mutex_lock(&uclamp_mutex);
> rcu_read_lock();

Best,
Patrick