Re: [RFC PATCH v4 5/7] sched, tracing: add to report task state in symbolic chars

From: Ze Gao
Date: Wed Aug 02 2023 - 22:39:23 EST


On Wed, Aug 2, 2023 at 11:07 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Wed, 2 Aug 2023 08:10:00 -0400
> Ze Gao <zegao2021@xxxxxxxxx> wrote:
>
> > From: Ze Gao <zegao2021@xxxxxxxxx>
> >
> > Internal representations of task state are likely to be changed
> > or ordered, and reporting them to userspace without exporting
> > them as part of API is basically wrong, which can easily break
> > a userspace observability tool as kernel evolves. For example,
> > perf suffers from this and still reports wrong states as of this
> > writing.
> >
> > OTOH, some masqueraded states like TASK_REPORT_IDLE and
> > TASK_REPORT_MAX are also reported inadvertently, which confuses
> > things even more and most userspace tools do not even take them
> > into consideration.
> >
> > So add a new variable in company with the old raw value to
> > report task state in symbolic chars, which are self-explaining
> > and no further translation is needed. Of course this does not
> > break any userspace tool.
> >
> > Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use
> > the old conventions for the rest.
> >
> > Signed-off-by: Ze Gao <zegao@xxxxxxxxxxx>
> > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>
> > Acked-by: Ian Rogers <irogers@xxxxxxxxxx>
> > ---
> > include/trace/events/sched.h | 44 ++++++++++++++++++++++--------------
> > 1 file changed, 27 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> > index 7d34db20b2c6..1c7b94793495 100644
> > --- a/include/trace/events/sched.h
> > +++ b/include/trace/events/sched.h
> > @@ -6,6 +6,7 @@
> > #define _TRACE_SCHED_H
> >
> > #include <linux/kthread.h>
> > +#include <linux/sched.h>
> > #include <linux/sched/numa_balancing.h>
> > #include <linux/tracepoint.h>
> > #include <linux/binfmts.h>
> > @@ -214,6 +215,27 @@ static inline int __trace_sched_switch_state(bool preempt,
> >
> > return state ? (1 << (state - 1)) : state;
> > }
> > +
> > +static inline char __trace_sched_switch_state_char(bool preempt,
> > + unsigned int prev_state,
> > + struct task_struct *p)
> > +{
> > + long state;
> > +
> > +#ifdef CONFIG_SCHED_DEBUG
> > + BUG_ON(p != current);
>
> BUG? Why not WARN_ON()?

I directly copied it from __trace_sched_switch_state since they
are very similar. I had doubt on this too but decided to keep it in
case people want to be 100% sure that the current task is exactly
the one that is being switched, otherwise it's a fatal problem for
scheduler at the point where trace_sched_switch is called.

If you think WARN_ON_ONCE is more appropriate, I can fix both
in v6.

Thoughts?

Regards,
Ze