Re: [RFC PATCH 10/16] sched/qos: Add a new sched-qos interface
From: John Stultz
Date: Wed Nov 27 2024 - 20:51:03 EST
On Tue, Aug 20, 2024 at 9:36 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>
> The need to describe the conflicting demand of various workloads hasn't
> been higher. Both hardware and software have moved rapidly in the past
> decade and system usage is more diverse and the number of workloads
> expected to run on the same machine whether on Mobile or Server markets
> has created a big dilemma on how to better manage those requirements.
>
> The problem is that we lack mechanisms to allow these workloads to
> describe what they need, and then allow kernel to do best efforts to
> manage those demands based on the hardware it is running on
> transparently and current system state.
>
> Example of conflicting requirements that come across frequently:
>
> 1. Improve wake up latency for SCHED_OTHER. Many tasks end up
> using SCHED_FIFO/SCHED_RR to compensate for this shortcoming.
> RT tasks lack power management and fairness and can be hard
> and error prone to use correctly and portably.
>
> 2. Prefer spreading vs prefer packing on wake up for a group of
> tasks. Geekbench-like workloads would benefit from
> parallelising on different CPUs. hackbench type of workloads
> can benefit from waking on up same CPUs or a CPU that is
> closer in the cache hierarchy.
>
> 3. Nice values for SCHED_OTHER are system wide and require
> privileges. Many workloads would like a way to set relative
> nice value so they can preempt each others, but not be
> impact or be impacted by other tasks belong to different
> workloads on the system.
>
> 4. Provide a way to tag some tasks as 'background' to keep them
> out of the way. SCHED_IDLE is too strong for some of these
> tasks but yet they can be computationally heavy. Example
> tasks are garbage collectors. Their work is both important
> and not important.
>
> 5. Provide a way to improve DVFS/upmigration rampup time for
> specific tasks that are bursty in nature and highly
> interactive.
>
> Whether any of these use cases warrants an additional QoS hint is
> something to be discussed individually. But the main point is to
> introduce an interface that can be extendable to cater for potentially
> those requirements and more. rampup_multiplier to improve
> DVFS/upmigration for bursty tasks will be the first user in later patch.
>
> It is desired to have apps (and benchmarks!) directly use this interface
> for optimal perf/watt. But in the absence of such support, it should be
> possible to write a userspace daemon to monitor workloads and apply
> these QoS hints on apps behalf based on analysis done by anyone
> interested in improving the performance of those workloads.
>
> Signed-off-by: Qais Yousef <qyousef@xxxxxxxxxxx>
> ---
...
> diff --git a/tools/perf/trace/beauty/include/uapi/linux/sched.h b/tools/perf/trace/beauty/include/uapi/linux/sched.h
> index 3bac0a8ceab2..67ef99f64ddc 100644
> --- a/tools/perf/trace/beauty/include/uapi/linux/sched.h
> +++ b/tools/perf/trace/beauty/include/uapi/linux/sched.h
> @@ -102,6 +102,9 @@ struct clone_args {
> __aligned_u64 set_tid_size;
> __aligned_u64 cgroup;
> };
> +
> +enum sched_qos_type {
> +};
> #endif
>
> #define CLONE_ARGS_SIZE_VER0 64 /* sizeof first published struct */
> @@ -132,6 +135,7 @@ struct clone_args {
> #define SCHED_FLAG_KEEP_PARAMS 0x10
> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> +#define SCHED_FLAG_QOS 0x80
>
Hey Qais,
Just heads up, It seems this needs to be added to SCHED_FLAG_ALL for
the code in later patches to be reachable.
thanks
-john