[RFC PATCH 0/5] CFS load tracking trace events
From: Dietmar Eggemann
Date: Tue Mar 28 2017 - 02:36:55 EST
This patch-set introduces trace events for load (and utilization)
tracking for the following three cfs scheduler bricks: cfs_rq,
sched_entity and task_group.
I've decided to sent it out because people are discussing problems with
PELT related functionality and as a base for the related discussion next
week at the OSPM-summit in Pisa.
The requirements for the design are:
(1) Support for all combinations related to the following kernel
configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED,
CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL.
(2) All key=value pairs have to appear in all combinations of the
kernel configuration options mentioned under (1).
(3) Stable interface, i.e. only use keys for a key=value pairs which
are independent from the underlying (load tracking) implementation.
(4) Minimal invasive to the actual cfs scheduler code, i.e. the trace
event should not have to be guarded by an if condition (e.g.
entity_is_task(se) to distinguish between task and task_group)
or kernel configuration switch.
The trace events only expose one load (and one utilization) key=value
pair besides those used to identify the cfs scheduler brick. They do
not provide any internal implementation details of the actual (PELT)
load-tracking mechanism.
This limitation might be too much since for a cfs_rq we can only trace
runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load
(cfs_rq->avg.load_avg).
Other load metrics like instantaneous load ({cfs_rq|se}->load.weight)
are not traced but could be easily added.
The following keys are used to identify the cfs scheduler brick:
(1) Cpu number the cfs scheduler brick is attached to.
(2) Task_group path and (css) id.
(3) Task name and pid.
In case a key does not apply due to an unset Kernel configuration option
or the fact that a sched_entity can represent either a task or a
task_group its value is set to an invalid default:
(1) Task_group path: "(null)"
(2) Task group id: -1
(3) Task name: "(null)"
(4) Task pid: -1
Load tracking trace events are placed into kernel/sched/fair.c:
(1) for cfs_rq:
- In PELT core function __update_load_avg().
- During sched_entity attach/detach.
- During sched_entity load/util propagation down the task_group
hierarchy.
(2) for sched_entity:
- In PELT core function __update_load_avg().
- During sched_entity load/util propagation down the task_group
hierarchy.
(3) for task_group:
- In its PELT update function.
An alternative for using __update_load_avg() would be to put trace
events into update_cfs_rq_load_avg() for cfs_rq's and into
set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for
sched_entities. This would also get rid of the if(cfs_rq)/else condition
in __update_load_avg().
These trace events still look a bit fragile.
First of all, this patch-set has to use cfs specific data structures
in the global task scheduler trace file include/trace/events/sched.h.
And second, the accessor-functions (rq_of(), task_of(), etc.) are
private to the cfs scheduler class. In case they would be public these
trace events would be easier to code. That said, group_cfs_rq() is
already made public by this patch-stack.
This version bases on tip/sched/core as of yesterday (bc4278987e38). It
has been compile tested on ~160 configurations via 0day's kbuild test
robot.
Dietmar Eggemann (5):
sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
sched/events: Introduce cfs_rq load tracking trace event
sched/fair: Export group_cfs_rq()
sched/events: Introduce sched_entity load tracking trace event
sched/events: Introduce task_group load tracking trace event
include/linux/sched.h | 10 +++
include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++
kernel/sched/autogroup.c | 2 -
kernel/sched/autogroup.h | 2 -
kernel/sched/fair.c | 26 ++++----
5 files changed, 167 insertions(+), 16 deletions(-)
--
2.11.0