Re: [PATCH v3 5/6] sched: Add sched_overutilized tracepoint

From: Qais Yousef
Date: Mon Jun 17 2019 - 12:36:20 EST


On 06/17/19 17:50, Peter Zijlstra wrote:
> On Tue, Jun 04, 2019 at 12:14:58PM +0100, Qais Yousef wrote:
> > The new tracepoint allows us to track the changes in overutilized
> > status.
> >
> > Overutilized status is associated with EAS. It indicates that the system
> > is in high performance state. EAS is disabled when the system is in this
> > state since there's not much energy savings while high performance tasks
> > are pushing the system to the limit and it's better to default to the
> > spreading behavior of the scheduler.
> >
> > This tracepoint helps understanding and debugging the conditions under
> > which this happens.
> >
> > Signed-off-by: Qais Yousef <qais.yousef@xxxxxxx>
> > ---
> > include/trace/events/sched.h | 4 ++++
> > kernel/sched/fair.c | 11 +++++++++--
> > 2 files changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > index c7dd9bc7f001..edd96e04049f 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -621,6 +621,10 @@ DECLARE_TRACE(pelt_se_tp,
> > TP_PROTO(struct sched_entity *se),
> > TP_ARGS(se));
> >
> > +DECLARE_TRACE(sched_overutilized_tp,
> > + TP_PROTO(int overutilized, struct root_domain *rd),
> > + TP_ARGS(overutilized, rd));
> > +
>
> strictly speaking you only need @rd :-)

Yes. Sorry my brain was hardwired this is overutilized event so we need to
pass this info :-)

>
> > #endif /* _TRACE_SCHED_H */
> >
> > /* This part must be outside protection */
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8e0015ebf109..e2418741608e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5179,8 +5179,10 @@ static inline bool cpu_overutilized(int cpu)
> >
> > static inline void update_overutilized_status(struct rq *rq)
> > {
> > - if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu))
> > + if (!READ_ONCE(rq->rd->overutilized) && cpu_overutilized(rq->cpu)) {
> > WRITE_ONCE(rq->rd->overutilized, SG_OVERUTILIZED);
> > + trace_sched_overutilized_tp(1, rq->rd);
> > + }
> > }
> > #else
> > static inline void update_overutilized_status(struct rq *rq) { }
> > @@ -8542,8 +8544,13 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> >
> > /* Update over-utilization (tipping point, U >= 0) indicator */
> > WRITE_ONCE(rd->overutilized, sg_status & SG_OVERUTILIZED);
> > +
> > + trace_sched_overutilized_tp(!!(sg_status & SG_OVERUTILIZED), rd);
> > } else if (sg_status & SG_OVERUTILIZED) {
> > - WRITE_ONCE(env->dst_rq->rd->overutilized, SG_OVERUTILIZED);
> > + struct root_domain *rd = env->dst_rq->rd;
> > +
> > + WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
> > + trace_sched_overutilized_tp(1, rd);
> > }
> > }
>
> But I figure since we need both values anyway, this isn't too much of a
> bother.
>
> I'm going to flip the argument order though.

Sounds good to me. The good news is that changing the signature should be
doable in the future if we felt the need to evolve it :-)

Thanks

--
Qais Yousef