Re: [RFC PATCH 0/5] CFS load tracking trace events
From: Vincent Guittot
Date: Tue Mar 28 2017 - 06:07:40 EST
On 28 March 2017 at 08:35, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
> This patch-set introduces trace events for load (and utilization)
> tracking for the following three cfs scheduler bricks: cfs_rq,
> sched_entity and task_group.
>
> I've decided to sent it out because people are discussing problems with
> PELT related functionality and as a base for the related discussion next
> week at the OSPM-summit in Pisa.
>
> The requirements for the design are:
>
> (1) Support for all combinations related to the following kernel
> configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED,
> CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL.
>
> (2) All key=value pairs have to appear in all combinations of the
> kernel configuration options mentioned under (1).
>
> (3) Stable interface, i.e. only use keys for a key=value pairs which
> are independent from the underlying (load tracking) implementation.
>
> (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace
> event should not have to be guarded by an if condition (e.g.
> entity_is_task(se) to distinguish between task and task_group)
> or kernel configuration switch.
>
> The trace events only expose one load (and one utilization) key=value
> pair besides those used to identify the cfs scheduler brick. They do
> not provide any internal implementation details of the actual (PELT)
> load-tracking mechanism.
> This limitation might be too much since for a cfs_rq we can only trace
> runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load
> (cfs_rq->avg.load_avg).
> Other load metrics like instantaneous load ({cfs_rq|se}->load.weight)
> are not traced but could be easily added.
>
> The following keys are used to identify the cfs scheduler brick:
>
> (1) Cpu number the cfs scheduler brick is attached to.
>
> (2) Task_group path and (css) id.
>
> (3) Task name and pid.
Do you really need both path/name and id/pid ?
The path/name looks quite intrusive so can't we just use id/pid ?
>
> In case a key does not apply due to an unset Kernel configuration option
> or the fact that a sched_entity can represent either a task or a
> task_group its value is set to an invalid default:
>
> (1) Task_group path: "(null)"
>
> (2) Task group id: -1
>
> (3) Task name: "(null)"
>
> (4) Task pid: -1
>
> Load tracking trace events are placed into kernel/sched/fair.c:
>
> (1) for cfs_rq:
>
> - In PELT core function __update_load_avg().
>
> - During sched_entity attach/detach.
>
> - During sched_entity load/util propagation down the task_group
> hierarchy.
>
> (2) for sched_entity:
>
> - In PELT core function __update_load_avg().
>
> - During sched_entity load/util propagation down the task_group
> hierarchy.
>
> (3) for task_group:
>
> - In its PELT update function.
>
> An alternative for using __update_load_avg() would be to put trace
> events into update_cfs_rq_load_avg() for cfs_rq's and into
> set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for
> sched_entities. This would also get rid of the if(cfs_rq)/else condition
> in __update_load_avg().
>
> These trace events still look a bit fragile.
> First of all, this patch-set has to use cfs specific data structures
> in the global task scheduler trace file include/trace/events/sched.h.
> And second, the accessor-functions (rq_of(), task_of(), etc.) are
> private to the cfs scheduler class. In case they would be public these
> trace events would be easier to code. That said, group_cfs_rq() is
> already made public by this patch-stack.
>
> This version bases on tip/sched/core as of yesterday (bc4278987e38). It
> has been compile tested on ~160 configurations via 0day's kbuild test
> robot.
>
> Dietmar Eggemann (5):
> sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
> sched/events: Introduce cfs_rq load tracking trace event
> sched/fair: Export group_cfs_rq()
> sched/events: Introduce sched_entity load tracking trace event
> sched/events: Introduce task_group load tracking trace event
>
> include/linux/sched.h | 10 +++
> include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/autogroup.c | 2 -
> kernel/sched/autogroup.h | 2 -
> kernel/sched/fair.c | 26 ++++----
> 5 files changed, 167 insertions(+), 16 deletions(-)
>
> --
> 2.11.0
>