[PATCH v4 0/8] sched: support schedstats for RT sched class

From: Yafang Shao
Date: Sun Sep 05 2021 - 10:36:10 EST

Why do we need schedstats ?

schedstats is a useful feature to do thread-level latency analysis. Our
usecase as follows,

Userspace Code Scope Profiler

user_func_abc(); <---- uprobe_scope_begin() get start schedstats
user_func_xyz(); <---- uprobe_scope_end() get end schedstats

Then with the result of (end - begin) we can get below latency details
in a specific user scope,

scope_latency = Wait + Sleep + Blocked [1] + Run (stime + utime)

If there's no schedstats we have to trace the heavy sched::sched_switch
and do a lot more stuff.

[1]. With patch #4 and don't include sum_block_runtime in sum_sleep_runtime

Support schedstats for RT sched class

If we want to use the schedstats facility to trace other sched classes, we
should make it independent of fair sched class. The struct sched_statistics
is the schedular statistics of a task_struct or a task_group. So we can
move it into struct task_struct and struct task_group to achieve the goal.

After the patch, schestats are orgnized as follows,

struct task_struct {
struct sched_entity se;
struct sched_rt_entity rt;
struct sched_dl_entity dl;
struct sched_statistics stats;

Regarding the task group, schedstats is only supported for fair group
sched, and a new struct sched_entity_stats is introduced, suggested by
Peter -

struct sched_entity_stats {
struct sched_entity se;
struct sched_statistics stats;
} __no_randomize_layout;

Then with the se in a task_group, we can easily get the stats.

The sched_statistics members may be frequently modified when schedstats is
enabled, in order to avoid impacting on random data which may in the same
cacheline with them, the struct sched_statistics is defined as cacheline

As this patch changes the core struct of scheduler, so I verified the
performance it may impact on the scheduler with 'perf bench sched
pipe', suggested by Mel. Below is the result, in which all the values
are in usecs/op.
Before After
kernel.sched_schedstats=0 5.2~5.4 5.2~5.4
kernel.sched_schedstats=1 5.3~5.5 5.3~5.5
[These data is a little difference with the earlier version, that is
because my old test machine is destroyed so I have to use a new
different test machine.]

Almost no impact on the sched performance.

The user can get the schedstats information in the same way in fair sched
class. For example,
fair RT
/proc/[pid]/sched /proc/[pid]/sched

schedstats is not supported for RT group.

The sched:sched_stat_{wait, sleep, iowait, blocked} tracepoints can
be used to trace RT tasks as well.

Support schedstats for any other sched classes

After this patchset, it is very easy to extend the schedstats to any
other sched classes. The deadline sched class is also supported in this

Changes Since v3:
Various code improvement per Peter,
- don't support schedstats for rt group
- introduce struct sched_entity_stats for fair group
- change the position of 'struct sched_statistics stats'
- fixes indent issue
- change the output format in /proc/[pid]/sched
- add the usecase of schedstats
- support schedstats for deadline task
- and other suggestions

Changes Since v2:
- Fixes the output format in /proc/[pid]/sched
- Rebase it on the latest code
- Redo the performance test

Changes since v1:
- Fix the build failure reported by kernel test robot.
- Add the performance data with 'perf bench sched pipe', suggested by
- Make the struct sched_statistics cacheline aligned.
- Introduce task block time in schedstats

Changes since RFC:
- improvement of schedstats helpers, per Mel.
- make struct schedstats independent of fair sched class

Yafang Shao (8):
sched, fair: use __schedstat_set() in set_next_entity()
sched: make struct sched_statistics independent of fair sched class
sched: make schedstats helpers independent of fair sched class
sched: introduce task block time in schedstats
sched, rt: support sched_stat_runtime tracepoint for RT sched class
sched, rt: support schedstats for RT sched class
sched, dl: support sched_stat_runtime tracepoint for deadline sched
sched, dl: support schedstats for deadline sched class

include/linux/sched.h | 8 +-
kernel/sched/core.c | 25 +++---
kernel/sched/deadline.c | 99 +++++++++++++++++++++-
kernel/sched/debug.c | 97 +++++++++++----------
kernel/sched/fair.c | 177 +++++++++++----------------------------
kernel/sched/rt.c | 130 +++++++++++++++++++++++++++-
kernel/sched/stats.c | 104 +++++++++++++++++++++++
kernel/sched/stats.h | 49 +++++++++++
kernel/sched/stop_task.c | 4 +-
9 files changed, 500 insertions(+), 193 deletions(-)