Re: [PATCH v3 5/6] sched: Add sched_overutilized tracepoint

From: Peter Zijlstra
Date: Mon Jun 17 2019 - 11:55:28 EST


On Tue, Jun 04, 2019 at 12:14:58PM +0100, Qais Yousef wrote:
> The new tracepoint allows us to track the changes in overutilized
> status.
>
> Overutilized status is associated with EAS. It indicates that the system
> is in high performance state. EAS is disabled when the system is in this
> state since there's not much energy savings while high performance tasks
> are pushing the system to the limit and it's better to default to the
> spreading behavior of the scheduler.
>
> This tracepoint helps understanding and debugging the conditions under
> which this happens.
>
> Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx>
> ---
> include/trace/events/sched.h | 4 ++++
> kernel/sched/fair.c | 11 +++++++++--
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index c7dd9bc7f001..edd96e04049f 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -621,6 +621,10 @@ DECLARE_TRACE(pelt_se_tp,
> TP_PROTO(struct sched_entity *se),
> TP_ARGS(se));
>
> +DECLARE_TRACE(sched_overutilized_tp,
> + TP_PROTO(int overutilized, struct root_domain *rd),
> + TP_ARGS(overutilized, rd));
> +

strictly speaking you only need @rd :-)

> #endif /* _TRACE_SCHED_H */
>
> /* This part must be outside protection */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8e0015ebf109..e2418741608e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5179,8 +5179,10 @@ static inline bool cpu_overutilized(int cpu)
>
> static inline void update_overutilized_status(struct rq *rq)
> {
> - if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
> + if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> + trace_sched_overutilized_tp(1, rq->rd);
> + }
> }
> #else
> static inline void update_overutilized_status(struct rq *rq) { }
> @@ -8542,8 +8544,13 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>
> /* Update over-utilization (tipping point, U >= 0) indicator */
> WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
> +
> + trace_sched_overutilized_tp(!!(sg_status & SG_OVERUTILIZED), rd);
> } else if (sg_status & SG_OVERUTILIZED) {
> - WRITE_ONCE(env->dst_rq->rd->overutilized, SG_OVERUTILIZED);
> + struct root_domain *rd = env->dst_rq->rd;
> +
> + WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> + trace_sched_overutilized_tp(1, rd);
> }
> }

But I figure since we need both values anyway, this isn't too much of a
bother.

I'm going to flip the argument order though.